An educational repository meant to understand Speculative Decoding and the math behind it. We want to be able to understand why it works.
Speculative Decoding/Sampling was written in Fast Transformer Inference via Speculative Decoding by google and Accelerating Large Language Model Decoding via Speculative Sampling by google deepmind
- There would be a few notebooks to understand
- Understanding Model Inference and Different Decoding/Generation Techniques
- Deterministic Decoding
- Greedy Sampling
- Beam Search
- Probabilistic Decoding/Sampling
- Sampling with temperature
- Top K Sampling
- Top P Sampling
- Deterministic Decoding
- Understanding the Speculative Decoding Theory
- Proving that the Speculative Decoding actual samples under the target distribution
- Can we use models of different architectures but similart to do speculative decoding?
Target Model Draft Model Speculative Sampling Speculative Decoding
Fast Transformer Inference via Speculative Decoding Accelerating Large Language Model Decoding via Speculative Sampling