In recent years, developers have increasingly become the target of cyberattacks within open-source component libraries. Hackers have continuously upgraded their tactics by using counterfeit software packages in supply chain attacks, and recently even extended these attacks to forged artificial intelligence (AI) frameworks and tainted machine learning (ML) models. A recent study revealed that hackers successfully executed such an attack by uploading malicious software packages disguised as development toolkits from Alibaba Cloud's AI Lab.

Researchers discovered three malicious packages on the Python Package Index (PyPI) that impersonated Alibaba Cloud AI Lab's SDKs but contained no legitimate functionality. These malicious packages used tainted ML models stored in the Pickle format to steal information from users' environments and send it to servers controlled by attackers.

Hackers, Code, Programmers

Image source note: The image was generated by AI, and the image authorization service provider is Midjourney.

Hackers may have chosen to hide malicious code within ML models because current security tools are only beginning to support detecting malicious behavior in ML file formats, which are traditionally viewed as mediums for sharing data rather than distributing executable code.

Pickle is an official Python module used for object serialization and is commonly used to store ML models related to the PyTorch library. With PyTorch's widespread use among AI developers, the Pickle format has become increasingly popular. However, hackers have already exploited this format to host tainted models on platforms like Hugging Face. Although Hugging Face employs the open-source tool Picklescan to detect potential risks, researchers noted that bypassing detection remains possible.

The three malicious packages involved in this attack were named aliyun-ai-labs-snippets-sdk, ai-labs-snippets-sdk, and aliyun-ai-labs-sdk. They were collectively downloaded 1,600 times before being discovered and taken down within a day. Developers' computers often contain various credentials, API tokens, and other service access keys, making them vulnerable to lateral movement and further system infiltration if compromised.

These malicious SDKs load tainted PyTorch models through __init__.py scripts. These models execute base64-encoded code designed to steal user login information, network addresses, and the name of the organization the machine belongs to. Notably, the primary targets of these malicious codes may be Chinese developers, as Alibaba Cloud SDKs attract local developers using the service. However, this method can also target any developer by using different lures.

This attack highlights the infancy of security risks in machine learning model file formats. Current security tools are far from perfect in detecting malicious ML models.