amd
PARD is a high-performance speculative decoding method that can convert autoregressive draft models into parallel draft models at low cost, significantly accelerating the inference of large language models.