← Back to Documentation Index

Algorithmic Analysis: mpmc-std Design Principles

This document examines the algorithmic foundations of mpmc-std compared to established queue implementations, focusing on design trade-offs and architectural differences rather than performance claims.

Algorithmic Approaches Overview

Memory Layout Comparison


LMAX Disruptor (Ring Buffer):
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│  0  │  1  │  2  │  3  │  4  │  5  │  6  │  7  │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
      ↑                               ↑
   Producer                       Consumer
   Sequence                       Sequence
Michael & Scott (Linked Nodes):
Head → [Node] → [Node] → [Node] → [Node] → Tail
       ┌─────┐   ┌─────┐   ┌─────┐   ┌─────┐
       │Data │   │Data │   │Data │   │Data │
       │Next*│   │Next*│   │Next*│   │Next*│
       └─────┘   └─────┘   └─────┘   └─────┘
mpmc-std (Sequence-Coordinated Array):
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ S:0 │ S:1 │ S:2 │ S:3 │ S:4 │ S:5 │ S:6 │ S:7 │
│ [D] │ [D] │ [D] │ [D] │ [D] │ [D] │ [D] │ [D] │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
S = Sequence Number, D = Data Slot

Core Algorithmic Differences

1. LMAX Disruptor Algorithm

Core Principle: Ring buffer with sequence barriers and memory barriers for coordination.


Algorithm: Disruptor Send
1. claim_slot = next_sequence++
2. wait_for_slot_available(claim_slot)
3. write_data_to_slot(claim_slot)
4. publish_sequence(claim_slot)
5. notify_consumers()
Memory Barriers:
- Store-Store before publish
- Load-Acquire for dependency tracking
- Complex barrier coordination for multiple consumers

Key Characteristics:

Coordination: Complex sequence barriers and dependency chains
Memory Management: Pre-allocated ring buffer
Ordering: Strong ordering guarantees with memory barriers
Complexity: Multiple sequence types (producer, consumer, gating)

2. Michael & Scott Algorithm

Core Principle: Lock-free linked list with atomic compare-and-swap on head/tail pointers.


Algorithm: Michael & Scott Enqueue
1. node = allocate_new_node(data)
2. loop:
3.   tail = load_tail()
4.   next = load_tail_next() 
5.   if tail == current_tail:
6.     if next == null:
7.       if CAS(tail.next, null, node):
8.         break
9.     else:
10.      CAS(tail_ptr, tail, next)  // Help move tail
11. CAS(tail_ptr, tail, node)
Memory Pattern:
- Dynamic allocation per operation
- Pointer chasing through linked structure
- ABA problem mitigation required

Key Characteristics:

Coordination: Atomic pointer manipulation with helping protocol
Memory Management: Dynamic allocation per enqueue/dequeue
Ordering: Natural FIFO through linked structure
Complexity: ABA problem handling, memory reclamation

3. mpmc-std Algorithm

Core Principle: Array slots with per-slot sequence numbers for coordination.


Algorithm: mpmc-std Send
1. pos = head.fetch_add(1, Relaxed)
2. slot = &buffer[pos & mask]
3. while slot.sequence.load(Acquire) != pos:
4.   spin_loop_hint()
5. slot.item.write(data)
6. slot.sequence.store(pos + 1, Release)
Sequence States:
- slot.sequence == pos     → Ready for producer
- slot.sequence == pos + 1 → Data available for consumer  
- slot.sequence == pos + N → Consumed, waiting for wraparound

Key Characteristics:

Coordination: Per-slot sequence matching
Memory Management: Fixed pre-allocated array
Ordering: Mathematical sequence progression
Complexity: Simple slot state machine

Detailed Algorithmic Analysis

Coordination Mechanisms

Disruptor: Barrier-Based Coordination


Producer Barriers:     Consumer Barriers:
┌─────────────────┐   ┌─────────────────┐
│ Claim Sequence  │   │ Wait for Data   │
│ Wait for Space  │   │ Process Batch   │
│ Write Data      │   │ Update Cursor   │
│ Publish Batch   │   │ Signal Complete │
└─────────────────┘   └─────────────────┘
        │                       │
        └──── Memory Barriers ──┘

mpmc-std: Sequence Matching


Producer Flow:              Consumer Flow:
┌─────────────────┐        ┌─────────────────┐
│ Get Position    │        │ Get Position    │
│ Wait for Slot   │ ←---→  │ Wait for Data   │
│ Write Data      │        │ Read Data       │
│ Release Slot    │        │ Release Slot    │
└─────────────────┘        └─────────────────┘

Comparison Summary Table

Feature	LMAX Disruptor	Michael & Scott	mpmc-std	mpmc-std SIMD
Coordination	Sequence barriers	Atomic pointer ops	Per-slot sequence	Vectorized sequences
Memory Layout	Ring buffer	Linked nodes	Fixed array	SIMD-aligned array
Memory Management	Pre-allocated	Dynamic allocation	Pre-allocated	Pre-allocated
Ordering	Strong (barriers)	FIFO (linked list)	Mathematical sequence	Mathematical sequence
Complexity	High (multiple seqs)	Medium (ABA, helping)	Low (slot state)	Low (vectorized)
Scalability	High (batch, barriers)	High (lock-free)	High (lock-free)	Higher (SIMD batches)
ABA Handling	Not required	Required	Not required	Not required
Producer/Consumer	Multiple	Multiple	Multiple	Multiple
Data Types	Generic	Generic	Generic	u64 optimized
Batch Operations	Optional	No	No	Yes (4x u64)
Typical Use Case	Low-latency trading	General-purpose queues	General-purpose	Numeric workloads

SIMD Algorithm Extension

The mpmc-std SIMD variant extends the basic algorithm with vectorized operations for u64 data types.

SIMD Coordination Algorithm


Traditional Slot Check (4 operations):
for i in 0..4:
    if slot[i].sequence == expected[i]:  // 4 separate checks
        slot_available[i] = true
SIMD Slot Check (1 vectorized operation):
sequences = [slot[0].seq, slot[1].seq, slot[2].seq, slot[3].seq]  // Load 4 sequences
expected  = [head+0, head+1, head+2, head+3]                    // Generate expected
mask = simd_eq(sequences, expected)                              // Compare all at once
if mask.all():                                                   // All slots ready
    claim_batch()

SIMD vs Traditional Performance Profile

Memory Access Pattern:


Traditional: 4 separate loads + 4 separate comparisons + 4 branches
SIMD:        1 vectorized load + 1 vectorized compare + 1 branch

Theoretical Performance Gain:

Memory: 4x fewer load operations (vectorized)
ALU: 4x fewer comparison operations (parallel)
Branch: 4x fewer conditional branches
Cache: Better spatial locality with vectorized access

Real-World Results:

Single-threaded: 10-30% improvement
High-contention: Up to 1.8x speedup (4+ thread pairs)
Optimal workload: u64 numeric processing in 4-element batches

SIMD Algorithmic Trade-offs

Advantages:

Vectorized sequence checking reduces instruction count
Better CPU pipeline utilization
Improved cache efficiency for batch operations
Maintains all lockless guarantees

Limitations:

Requires nightly Rust (unstable portable_simd)
Limited to u64 data types
Minimum capacity requirements (16+ elements)
Best performance only with 4-aligned batch sizes

Use Case Optimization:


// Optimal: Numeric processing with known batch sizes
let numeric_data: Vec<u64> = sensor_readings();
simd_queue.send_batch(&numeric_data[0..4])?;
// Suboptimal: Mixed types or variable sizes  
let mixed_data: Vec<String> = user_messages();
regular_queue.send(mixed_data[0].clone())?;  // Better choice

📊 Comparative Analysis

Algorithmic Analysis: mpmc-std Design Principles

Algorithmic Approaches Overview

Memory Layout Comparison

Core Algorithmic Differences

1. LMAX Disruptor Algorithm

2. Michael & Scott Algorithm

3. mpmc-std Algorithm

Detailed Algorithmic Analysis

Coordination Mechanisms

Disruptor: Barrier-Based Coordination

mpmc-std: Sequence Matching

Comparison Summary Table

SIMD Algorithm Extension

SIMD Coordination Algorithm

SIMD vs Traditional Performance Profile

SIMD Algorithmic Trade-offs