πŸ“Š Comparative Analysis

Performance comparison with industry-standard queue implementations

← Back to Documentation Index

Algorithmic Analysis: mpmc-std Design Principles

This document examines the algorithmic foundations of mpmc-std compared to established queue implementations, focusing on design trade-offs and architectural differences rather than performance claims.

Algorithmic Approaches Overview

Memory Layout Comparison


LMAX Disruptor (Ring Buffer):
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚  0  β”‚  1  β”‚  2  β”‚  3  β”‚  4  β”‚  5  β”‚  6  β”‚  7  β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
      ↑                               ↑
   Producer                       Consumer
   Sequence                       Sequence
Michael & Scott (Linked Nodes):
Head β†’ [Node] β†’ [Node] β†’ [Node] β†’ [Node] β†’ Tail
       β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”
       β”‚Data β”‚   β”‚Data β”‚   β”‚Data β”‚   β”‚Data β”‚
       β”‚Next*β”‚   β”‚Next*β”‚   β”‚Next*β”‚   β”‚Next*β”‚
       β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”˜
mpmc-std (Sequence-Coordinated Array):
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ S:0 β”‚ S:1 β”‚ S:2 β”‚ S:3 β”‚ S:4 β”‚ S:5 β”‚ S:6 β”‚ S:7 β”‚
β”‚ [D] β”‚ [D] β”‚ [D] β”‚ [D] β”‚ [D] β”‚ [D] β”‚ [D] β”‚ [D] β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
S = Sequence Number, D = Data Slot

Core Algorithmic Differences

1. LMAX Disruptor Algorithm

Core Principle: Ring buffer with sequence barriers and memory barriers for coordination.



Algorithm: Disruptor Send
1. claim_slot = next_sequence++
2. wait_for_slot_available(claim_slot)
3. write_data_to_slot(claim_slot)
4. publish_sequence(claim_slot)
5. notify_consumers()
Memory Barriers:
- Store-Store before publish
- Load-Acquire for dependency tracking
- Complex barrier coordination for multiple consumers

Key Characteristics:

2. Michael & Scott Algorithm

Core Principle: Lock-free linked list with atomic compare-and-swap on head/tail pointers.



Algorithm: Michael & Scott Enqueue
1. node = allocate_new_node(data)
2. loop:
3.   tail = load_tail()
4.   next = load_tail_next() 
5.   if tail == current_tail:
6.     if next == null:
7.       if CAS(tail.next, null, node):
8.         break
9.     else:
10.      CAS(tail_ptr, tail, next)  // Help move tail
11. CAS(tail_ptr, tail, node)
Memory Pattern:
- Dynamic allocation per operation
- Pointer chasing through linked structure
- ABA problem mitigation required

Key Characteristics:

3. mpmc-std Algorithm

Core Principle: Array slots with per-slot sequence numbers for coordination.



Algorithm: mpmc-std Send
1. pos = head.fetch_add(1, Relaxed)
2. slot = &buffer[pos & mask]
3. while slot.sequence.load(Acquire) != pos:
4.   spin_loop_hint()
5. slot.item.write(data)
6. slot.sequence.store(pos + 1, Release)
Sequence States:
- slot.sequence == pos     β†’ Ready for producer
- slot.sequence == pos + 1 β†’ Data available for consumer  
- slot.sequence == pos + N β†’ Consumed, waiting for wraparound

Key Characteristics:

Detailed Algorithmic Analysis

Coordination Mechanisms

Disruptor: Barrier-Based Coordination


Producer Barriers:     Consumer Barriers:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Claim Sequence  β”‚   β”‚ Wait for Data   β”‚
β”‚ Wait for Space  β”‚   β”‚ Process Batch   β”‚
β”‚ Write Data      β”‚   β”‚ Update Cursor   β”‚
β”‚ Publish Batch   β”‚   β”‚ Signal Complete β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                       β”‚
        └──── Memory Barriers β”€β”€β”˜

mpmc-std: Sequence Matching


Producer Flow:              Consumer Flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Get Position    β”‚        β”‚ Get Position    β”‚
β”‚ Wait for Slot   β”‚ ←---β†’  β”‚ Wait for Data   β”‚
β”‚ Write Data      β”‚        β”‚ Read Data       β”‚
β”‚ Release Slot    β”‚        β”‚ Release Slot    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Comparison Summary Table

FeatureLMAX DisruptorMichael & Scottmpmc-stdmpmc-std SIMD
CoordinationSequence barriersAtomic pointer opsPer-slot sequenceVectorized sequences
Memory LayoutRing bufferLinked nodesFixed arraySIMD-aligned array
Memory ManagementPre-allocatedDynamic allocationPre-allocatedPre-allocated
OrderingStrong (barriers)FIFO (linked list)Mathematical sequenceMathematical sequence
ComplexityHigh (multiple seqs)Medium (ABA, helping)Low (slot state)Low (vectorized)
ScalabilityHigh (batch, barriers)High (lock-free)High (lock-free)Higher (SIMD batches)
ABA HandlingNot requiredRequiredNot requiredNot required
Producer/ConsumerMultipleMultipleMultipleMultiple
Data TypesGenericGenericGenericu64 optimized
Batch OperationsOptionalNoNoYes (4x u64)
Typical Use CaseLow-latency tradingGeneral-purpose queuesGeneral-purposeNumeric workloads

SIMD Algorithm Extension

The mpmc-std SIMD variant extends the basic algorithm with vectorized operations for u64 data types.

SIMD Coordination Algorithm


Traditional Slot Check (4 operations):
for i in 0..4:
    if slot[i].sequence == expected[i]:  // 4 separate checks
        slot_available[i] = true
SIMD Slot Check (1 vectorized operation):
sequences = [slot[0].seq, slot[1].seq, slot[2].seq, slot[3].seq]  // Load 4 sequences
expected  = [head+0, head+1, head+2, head+3]                    // Generate expected
mask = simd_eq(sequences, expected)                              // Compare all at once
if mask.all():                                                   // All slots ready
    claim_batch()

SIMD vs Traditional Performance Profile

Memory Access Pattern:


Traditional: 4 separate loads + 4 separate comparisons + 4 branches
SIMD:        1 vectorized load + 1 vectorized compare + 1 branch

Theoretical Performance Gain:


Real-World Results:

SIMD Algorithmic Trade-offs

Advantages:


Limitations:


Use Case Optimization:


// Optimal: Numeric processing with known batch sizes
let numeric_data: Vec<u64> = sensor_readings();
simd_queue.send_batch(&numeric_data[0..4])?;
// Suboptimal: Mixed types or variable sizes  
let mixed_data: Vec<String> = user_messages();
regular_queue.send(mixed_data[0].clone())?;  // Better choice