Parallel Algorithms
Parallel Prefix Sum (Scan) with CUDA (2007)
A classic reference on implementing a work-efficient parallel prefix sum algorithms
Thinking Parallel (2012)
Three part series on parallel algorithms, covering GPU collision detection, tree traversal, and tree construction.