Translated data: Andrej Karpathy introduced speculative execution, an optimization method that helps large models overcome memory limitations. By employing the "Speculative decoding" technique, large models can first be predicted by smaller models, which are then reviewed and corrected by the large model, thereby reducing memory access requirements. The effectiveness of this technique lies in the fact that most predictions are relatively simple, so even smaller models can make accurate predictions. This peculiar trick can accelerate the inference process of large models and optimize time performance.