Robotic-Action-Frame-Prediction-with-InstructPix2Pix
PublicThis repository contains the code and configuration files for training a multimodal fine-tuned `InstructPix2Pix` model to predict future robotic action frames. The model generates 256×256 resolution images conditioned on a current observation and textual instruction