cuSBF
Loading...
Searching...
No Matches
Classes | Public Member Functions | List of all members
cusbf::Filter< Config > Class Template Reference

cuSBF GPU-accelerated sectorized Bloom filter. More...

Classes

struct  Shard
 One 256-bit filter block stored as an array of Config::blockWordCount words. More...
 

Public Member Functions

 Filter (uint64_t requestedFilterBits)
 Constructs a Filter with at least requestedFilterBits bits of storage.
 
 Filter (const Filter &)=delete
 
Filteroperator= (const Filter &)=delete
 
 Filter (Filter &&)=default
 
Filteroperator= (Filter &&)=default
 
 ~Filter ()=default
 
uint64_t insertSequence (std::string_view sequence, cuda::stream_ref stream=cudaStream_t{})
 Inserts all valid k-mers from a host-resident sequence.
 
uint64_t insertSequenceDevice (device_span< const char > d_sequence, cuda::stream_ref stream=cudaStream_t{})
 Async insert of k-mers from a device-resident sequence.
 
FastxInsertReport insertRecordBatch (RecordBatchView batch, cuda::stream_ref stream=cudaStream_t{})
 Inserts a dense host-resident record batch.
 
FastxInsertReport insertFastx (std::istream &input, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{})
 Inserts all k-mers from a FASTA/FASTQ input stream.
 
FastxInsertReport insertFastxFile (std::string_view path, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{})
 Inserts all k-mers from a FASTA/FASTQ file via chunked streaming.
 
void containsSequenceDevice (device_span< const char > d_sequence, device_span< uint8_t > d_output, cuda::stream_ref stream=cudaStream_t{}) const
 Async query of k-mers from a device-resident sequence.
 
std::vector< uint8_t > containsSequence (std::string_view sequence, cuda::stream_ref stream=cudaStream_t{}) const
 Queries all valid k-mers from a host-resident sequence.
 
FastxQueryReport queryRecordBatch (RecordBatchView batch, cuda::stream_ref stream=cudaStream_t{}) const
 Queries a dense host-resident record batch and returns aggregate counts.
 
template<typename Consumer >
FastxQueryReport queryRecordBatch (RecordBatchView batch, Consumer &&consume, cuda::stream_ref stream=cudaStream_t{}) const
 Queries a dense host-resident record batch and streams per-record results.
 
FastxQueryReport queryFastx (std::istream &input, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries all k-mers from a FASTA/FASTQ input stream via chunked streaming.
 
FastxQueryReport queryFastxFile (std::string_view path, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries all k-mers from a FASTA/FASTQ file via chunked streaming.
 
template<typename Consumer >
FastxQueryReport queryFastxRecords (std::istream &input, Consumer &&consume, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries a FASTA/FASTQ stream and emits one record result per parsed record.
 
template<typename Consumer >
FastxQueryReport queryFastxFileRecords (std::string_view path, Consumer &&consume, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries a FASTA/FASTQ file and emits one record result per parsed record.
 
FastxDetailedQueryReport queryFastxDetailed (std::istream &input, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries all k-mers from a FASTA/FASTQ input stream via chunked streaming and preserves per-record hit vectors.
 
FastxDetailedQueryReport queryFastxFileDetailed (std::string_view path, double fillFraction=0.7, cuda::stream_ref stream=cudaStream_t{}) const
 Queries all k-mers from a FASTA/FASTQ file via chunked streaming and preserves per-record hit vectors.
 
void clear (cuda::stream_ref stream=cudaStream_t{})
 Resets all filter bits to zero and synchronises the stream.
 
float loadFactor () const
 Computes the fraction of set bits in the filter.
 
uint64_t filterBits () const
 Returns the total allocated capacity of the filter in bits.
 
uint64_t numShards () const
 Returns the number of shards.
 

Detailed Description

template<typename Config>
class cusbf::Filter< Config >

cuSBF GPU-accelerated sectorized Bloom filter.

Stores an in-device cuSBF divided into numShards 256-bit shards. Each shard is independently addressed by a minimizer-derived hash, and bits within a shard are updated/tested by a set of s-mer-derived hashes.

The filter is not copyable (device memory ownership). Move construction and assignment are supported.

Template Parameters
ConfigA cusbf::Config specialisation.

Definition at line 338 of file BloomFilter.cuh.

Constructor & Destructor Documentation

◆ Filter() [1/3]

template<typename Config >
cusbf::Filter< Config >::Filter ( uint64_t  requestedFilterBits)
inlineexplicit

Constructs a Filter with at least requestedFilterBits bits of storage.

The actual allocated capacity is rounded up to the next power-of-two number of shards.

Parameters
requestedFilterBitsDesired filter capacity in bits.

Definition at line 487 of file BloomFilter.cuh.

488 : numShards_(
489 cuda::std::bit_ceil(
490 std::max<uint64_t>(
491 1,
492 cuda::ceil_div(requestedFilterBits, Config::filterBlockBits)
493 )
494 )
495 ),
496 filterBits_(numShards_ * Config::filterBlockBits),
497 d_shards_(numShards_) {
498 clear();
499 }
void clear(cuda::stream_ref stream=cudaStream_t{})
Resets all filter bits to zero and synchronises the stream.
static constexpr uint64_t filterBlockBits
Here is the call graph for this function:

◆ Filter() [2/3]

template<typename Config >
cusbf::Filter< Config >::Filter ( const Filter< Config > &  )
delete

◆ Filter() [3/3]

template<typename Config >
cusbf::Filter< Config >::Filter ( Filter< Config > &&  )
default

◆ ~Filter()

template<typename Config >
cusbf::Filter< Config >::~Filter ( )
default

Member Function Documentation

◆ clear()

template<typename Config >
void cusbf::Filter< Config >::clear ( cuda::stream_ref  stream = cudaStream_t{})
inline

Resets all filter bits to zero and synchronises the stream.

Parameters
streamCUDA stream to use.

Definition at line 823 of file BloomFilter.cuh.

823 {}) {
824 CUSBF_CUDA_CALL(cudaMemsetAsync(
825 thrust::raw_pointer_cast(d_shards_.data()),
826 0,
827 d_shards_.size() * sizeof(Shard),
828 stream.get()
829 ));
830
831 stream.sync();
832 }
#define CUSBF_CUDA_CALL(err)
Macro for checking CUDA errors.
Definition helpers.cuh:132
Here is the caller graph for this function:

◆ containsSequence()

template<typename Config >
std::vector< uint8_t > cusbf::Filter< Config >::containsSequence ( std::string_view  sequence,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries all valid k-mers from a host-resident sequence.

Copies the sequence to device, queries, copies results back, and synchronises. The returned vector has one byte per k-mer: 1 = present, 0 = absent.

Parameters
sequenceRaw nucleotide sequence.
streamCUDA stream to use.
Returns
Per-k-mer membership results (empty if sequence length < k).

Definition at line 651 of file BloomFilter.cuh.

651 {}) const {
652 if (recordSymbolCount(sequence.size()) < Config::k) {
653 return {};
654 }
655
656 std::vector<uint8_t> output(recordKmerCount(sequence.size()));
657
658 const auto d_sequence = stagedSequenceView({sequence.data(), sequence.size()}, stream);
659 ensureResultCapacity(output.size());
660 launchContainsSequence(
661 d_sequence,
662 device_span<uint8_t>{thrust::raw_pointer_cast(d_resultBuffer_.data()), output.size()},
663 stream
664 );
665 CUSBF_CUDA_CALL(cudaMemcpyAsync(
666 output.data(),
667 thrust::raw_pointer_cast(d_resultBuffer_.data()),
668 output.size() * sizeof(uint8_t),
669 cudaMemcpyDeviceToHost,
670 stream.get()
671 ));
672
673 stream.sync();
674 return output;
675 }
static constexpr uint16_t k

◆ containsSequenceDevice()

template<typename Config >
void cusbf::Filter< Config >::containsSequenceDevice ( device_span< const char >  d_sequence,
device_span< uint8_t >  d_output,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Async query of k-mers from a device-resident sequence.

Does not synchronise the stream. Results are written to d_output (one byte per k-mer: 1 = present, 0 = absent).

Parameters
d_sequenceDevice-resident nucleotide sequence.
d_outputPer-k-mer result buffer (must hold kmerCount() bytes).
streamCUDA stream to use.

Definition at line 627 of file BloomFilter.cuh.

630 {}
631 ) const {
632 if (sequenceKmerCount(d_sequence) == 0) {
633 return;
634 }
635
636 launchContainsSequence(d_sequence, d_output, stream);
637 }

◆ filterBits()

template<typename Config >
uint64_t cusbf::Filter< Config >::filterBits ( ) const
inline

Returns the total allocated capacity of the filter in bits.

Definition at line 855 of file BloomFilter.cuh.

855 {
856 return filterBits_;
857 }

◆ insertFastx()

template<typename Config >
FastxInsertReport cusbf::Filter< Config >::insertFastx ( std::istream &  input,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
)
inline

Inserts all k-mers from a FASTA/FASTQ input stream.

Reads records in streaming fashion, accumulating them until the concatenated sequence approaches fillFraction of free GPU memory, then inserts each chunk independently.

Parameters
inputInput stream containing FASTA or FASTQ records.
fillFractionFraction of free GPU memory to fill per chunk (default 0.7).
streamCUDA stream to use.
Returns
Report summarising records indexed, bases processed, and k-mers inserted.

Definition at line 595 of file BloomFilter.cuh.

598 {}
599 ) {
600 return insertFastxStream(input, "<stream>", fillFraction, stream);
601 }

◆ insertFastxFile()

template<typename Config >
FastxInsertReport cusbf::Filter< Config >::insertFastxFile ( std::string_view  path,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
)
inline

Inserts all k-mers from a FASTA/FASTQ file via chunked streaming.

See also
insertFastx

Definition at line 608 of file BloomFilter.cuh.

611 {}
612 ) {
613 auto input = detail::openFastxFile(path);
614 return insertFastxStream(*input, path, fillFraction, stream);
615 }
consteval bool separatorPositionAlwaysEncodesInvalid(char *input, uint64_t separatorPosition, uint64_t index)
Recursively tests whether placing the separator byte at any position in an input of valid bytes alway...
Definition Alphabet.cuh:37

◆ insertRecordBatch()

template<typename Config >
FastxInsertReport cusbf::Filter< Config >::insertRecordBatch ( RecordBatchView  batch,
cuda::stream_ref  stream = cudaStream_t{} 
)
inline

Inserts a dense host-resident record batch.

batch.sequence stores the raw record payloads back-to-back without separators. batch.records stores ordered, non-overlapping byte ranges into that dense buffer. The filter injects alphabet separators between records internally, so callers do not need to materialise separator bytes themselves.

Synchronises before returning.

Parameters
batchDense record batch to insert.
streamCUDA stream to use.
Returns
Report summarising records indexed, bases processed, and k-mers inserted.

Definition at line 569 of file BloomFilter.cuh.

569 {}) {
570 const PreparedRecordBatch prepared = prepareRecordBatch(batch);
571 FastxInsertReport report;
572 report.recordsIndexed = prepared.records.size();
573 for (const PreparedRecordRange& record : prepared.records) {
574 report.indexedBases += record.size;
575 report.insertedKmers += record.validKmers;
576 }
577 if (!prepared.sequence.empty()) {
578 (void)insertSequence(prepared.sequence, stream);
579 }
580 return report;
581 }
uint64_t insertSequence(std::string_view sequence, cuda::stream_ref stream=cudaStream_t{})
Inserts all valid k-mers from a host-resident sequence.

◆ insertSequence()

template<typename Config >
uint64_t cusbf::Filter< Config >::insertSequence ( std::string_view  sequence,
cuda::stream_ref  stream = cudaStream_t{} 
)
inline

Inserts all valid k-mers from a host-resident sequence.

Copies the sequence to device, launches the insert kernel, and synchronises before returning. K-mers containing characters outside {A,C,G,T,a,c,g,t} are skipped.

Parameters
sequenceRaw nucleotide sequence.
streamCUDA stream to use (default: null stream).
Returns
Number of k-mers attempted (sequences shorter than k yield 0).

Definition at line 519 of file BloomFilter.cuh.

519 {}) {
520 if (recordSymbolCount(sequence.size()) < Config::k) {
521 return 0;
522 }
523
524 const uint64_t totalKmers = recordKmerCount(sequence.size());
525 const auto d_sequence = stagedSequenceView({sequence.data(), sequence.size()}, stream);
526 launchInsertSequence(d_sequence, stream);
527 stream.sync();
528 return totalKmers;
529 }

◆ insertSequenceDevice()

template<typename Config >
uint64_t cusbf::Filter< Config >::insertSequenceDevice ( device_span< const char >  d_sequence,
cuda::stream_ref  stream = cudaStream_t{} 
)
inline

Async insert of k-mers from a device-resident sequence.

Does not synchronise the stream, the caller is responsible for ordering relative to downstream operations.

Parameters
d_sequenceDevice-resident nucleotide sequence.
streamCUDA stream to use.
Returns
Number of k-mers attempted.

Definition at line 541 of file BloomFilter.cuh.

543 {}
544 ) {
545 const uint64_t totalKmers = sequenceKmerCount(d_sequence);
546 if (totalKmers == 0) {
547 return 0;
548 }
549
550 launchInsertSequence(d_sequence, stream);
551 return totalKmers;
552 }

◆ loadFactor()

template<typename Config >
float cusbf::Filter< Config >::loadFactor ( ) const
inline

Computes the fraction of set bits in the filter.

Returns
Load factor in [0, 1].

Definition at line 839 of file BloomFilter.cuh.

839 {
840 const auto* wordsBegin =
841 reinterpret_cast<const uint64_t*>(thrust::raw_pointer_cast(d_shards_.data()));
842 const uint64_t totalWords = numShards_ * Config::blockWordCount;
843 const uint64_t setBits = thrust::transform_reduce(
844 thrust::device,
845 wordsBegin,
846 wordsBegin + totalWords,
847 [] __device__(uint64_t w) -> uint64_t { return cuda::std::popcount(w); },
848 uint64_t{0},
849 cuda::std::plus<uint64_t>()
850 );
851 return static_cast<float>(setBits) / static_cast<float>(filterBits_);
852 }
static constexpr uint64_t blockWordCount

◆ numShards()

template<typename Config >
uint64_t cusbf::Filter< Config >::numShards ( ) const
inline

Returns the number of shards.

Definition at line 860 of file BloomFilter.cuh.

860 {
861 return numShards_;
862 }

◆ operator=() [1/2]

template<typename Config >
Filter & cusbf::Filter< Config >::operator= ( const Filter< Config > &  )
delete

◆ operator=() [2/2]

template<typename Config >
Filter & cusbf::Filter< Config >::operator= ( Filter< Config > &&  )
default

◆ queryFastx()

template<typename Config >
FastxQueryReport cusbf::Filter< Config >::queryFastx ( std::istream &  input,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries all k-mers from a FASTA/FASTQ input stream via chunked streaming.

See also
insertFastx

Definition at line 723 of file BloomFilter.cuh.

726 {}
727 ) const {
728 return queryFastxStream(input, "<stream>", fillFraction, stream);
729 }

◆ queryFastxDetailed()

template<typename Config >
FastxDetailedQueryReport cusbf::Filter< Config >::queryFastxDetailed ( std::istream &  input,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries all k-mers from a FASTA/FASTQ input stream via chunked streaming and preserves per-record hit vectors.

The returned report keeps aggregate counts plus one detailed record result in source order. Each detailed hit vector contains one byte per k-mer window: 1 = present, 0 = absent. Invalid-symbol windows remain in the vector as 0 and are excluded from queriedKmers.

See also
queryFastx

Definition at line 795 of file BloomFilter.cuh.

798 {}
799 ) const {
800 return queryFastxDetailedStream(input, "<stream>", fillFraction, stream);
801 }

◆ queryFastxFile()

template<typename Config >
FastxQueryReport cusbf::Filter< Config >::queryFastxFile ( std::string_view  path,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries all k-mers from a FASTA/FASTQ file via chunked streaming.

See also
queryFastx

Definition at line 736 of file BloomFilter.cuh.

739 {}
740 ) const {
741 auto input = detail::openFastxFile(path);
742 return queryFastxStream(*input, path, fillFraction, stream);
743 }

◆ queryFastxFileDetailed()

template<typename Config >
FastxDetailedQueryReport cusbf::Filter< Config >::queryFastxFileDetailed ( std::string_view  path,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries all k-mers from a FASTA/FASTQ file via chunked streaming and preserves per-record hit vectors.

See also
queryFastxDetailed

Definition at line 809 of file BloomFilter.cuh.

812 {}
813 ) const {
814 auto input = detail::openFastxFile(path);
815 return queryFastxDetailedStream(*input, path, fillFraction, stream);
816 }

◆ queryFastxFileRecords()

template<typename Config >
template<typename Consumer >
FastxQueryReport cusbf::Filter< Config >::queryFastxFileRecords ( std::string_view  path,
Consumer &&  consume,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries a FASTA/FASTQ file and emits one record result per parsed record.

See also
queryFastxRecords

Definition at line 774 of file BloomFilter.cuh.

778 {}
779 ) const {
780 auto input = detail::openFastxFile(path);
781 return queryFastxRecordsStream(*input, path, consume, fillFraction, stream);
782 }

◆ queryFastxRecords()

template<typename Config >
template<typename Consumer >
FastxQueryReport cusbf::Filter< Config >::queryFastxRecords ( std::istream &  input,
Consumer &&  consume,
double  fillFraction = 0.7,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries a FASTA/FASTQ stream and emits one record result per parsed record.

The callback receives record headers, record sequences, aggregate counts, and the per-window hit span for each record as soon as its chunk has been processed. The hit span remains valid only for the duration of the callback.

Parameters
inputInput stream containing FASTA or FASTQ records.
consumePer-record callback.
fillFractionFraction of free GPU memory to fill per chunk (default 0.7).
streamCUDA stream to use.
Returns
Aggregate query summary for the whole stream.

Definition at line 759 of file BloomFilter.cuh.

763 {}
764 ) const {
765 return queryFastxRecordsStream(input, "<stream>", consume, fillFraction, stream);
766 }

◆ queryRecordBatch() [1/2]

template<typename Config >
template<typename Consumer >
FastxQueryReport cusbf::Filter< Config >::queryRecordBatch ( RecordBatchView  batch,
Consumer &&  consume,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries a dense host-resident record batch and streams per-record results.

The callback receives one RecordQueryView per input record in source order. The hit span remains valid only for the duration of the callback.

Synchronises before returning.

Parameters
batchDense record batch to query.
consumePer-record callback.
streamCUDA stream to use.
Returns
Aggregate query summary for the whole batch.

Definition at line 710 of file BloomFilter.cuh.

713 {}
714 ) const {
715 return queryPreparedRecordBatch(prepareRecordBatch(batch), batch.sequence, consume, stream);
716 }

◆ queryRecordBatch() [2/2]

template<typename Config >
FastxQueryReport cusbf::Filter< Config >::queryRecordBatch ( RecordBatchView  batch,
cuda::stream_ref  stream = cudaStream_t{} 
) const
inline

Queries a dense host-resident record batch and returns aggregate counts.

batch.sequence stores raw record payloads back-to-back without separators. batch.records stores ordered, non-overlapping byte ranges into that dense buffer. The filter injects alphabet separators between records internally, so cross-record k-mers are never formed.

Synchronises before returning.

Parameters
batchDense record batch to query.
streamCUDA stream to use.
Returns
Aggregate query summary for the whole batch.

Definition at line 692 of file BloomFilter.cuh.

692 {}) const {
693 return queryRecordBatch(batch, [](const RecordQueryView&) {}, stream);
694 }
FastxQueryReport queryRecordBatch(RecordBatchView batch, cuda::stream_ref stream=cudaStream_t{}) const
Queries a dense host-resident record batch and returns aggregate counts.

The documentation for this class was generated from the following file: