LLaMA Fine-tuning Reduces GPU Memory Requirements by Half, Tsinghua Proposes 4-bit Optimizer

机器之心

Published inAI News · 1 min read · Sep 8, 2023

The research team at Tsinghua University has developed a 4-bit optimizer for neural network training, which significantly reduces the memory overhead of large model training. This optimizer cuts the GPU memory usage by up to 57% without compromising accuracy. Additionally, the team offers a ready-to-use 4-bit optimizer that can replace the existing ones, supporting low-precision versions of Adam and SGD. This research is crucial for addressing the GPU memory bottleneck in large model training.

Memory Requirements 4-bit Optimizer Model Training

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Wikipedia Releases AI Training Dataset to Curb Web Scraping

Wikipedia recently announced the release of a dataset optimized for AI model training, in collaboration with Kaggle, Google's data science community platform. This initiative aims to reduce the scraping of Wikipedia data by AI developers, conserving the platform's bandwidth and server resources. The dataset includes structured Wikipedia information in English and French, offering high machine readability for AI developers to facilitate modeling, fine-tuning, and data analysis. The Wikimedia Foundation stated that this data...

Apr 18, 2025

290

Meta's Plan to Use EU User Data for AI Training Raises Privacy Concerns

Meta Platforms, Inc. has announced plans to use user data from its European Union applications, including Facebook and Instagram, to train its artificial intelligence models. The company clarified that the training data will include users' public posts, comments, and interactions with Meta AI, but will exclude private messages with friends and family. Training will be limited to users aged 18 and over. Meta stated it will inform its EU users of this plan this week via in-app notifications and emails.

Apr 15, 2025

210

DeepSeek and Tsinghua University Collaborate on Self-Optimizing AI Model

Amidst the growing prevalence of artificial intelligence, the collaboration between DeepSeek and Tsinghua University has garnered significant industry attention. DeepSeek, a Chinese startup, is renowned for its breakthroughs in low-cost inference models. This collaboration aims to further reduce the training costs of AI models, thereby enhancing operational efficiency. DeepSeek recently launched a new low-cost inference model that has generated considerable market excitement. To further optimize this model, DeepSeek's research team...

Apr 7, 2025

450

Rise of Domestic AI Chips! Ant Group's Training Costs Reportedly Drop 20%, Rivaling Nvidia

Bloomberg, citing sources, reports that Ant Group has made a significant breakthrough in AI, reducing AI model training costs by 20% using chips from Alibaba and Huawei. This injects new momentum into China's drive for technological self-reliance. The report further indicates that Ant Group's internal tests show these Chinese-made AI chips rival Nvidia's offerings in performance. Widespread validation and adoption of this could significantly alter the global AI landscape.

Mar 25, 2025

260

CoreWeave and OpenAI Sign $119 Billion AI Infrastructure Agreement, Deepening Partnership

GPU cloud services company CoreWeave recently announced a five-year, $119 billion strategic partnership agreement with OpenAI. Under the agreement, CoreWeave will provide OpenAI with the necessary computing power to support the training and delivery of its AI models. This significant deal marks a deepening collaboration between the two companies in the field of artificial intelligence. As part of the collaboration, CoreWeave will also issue $3...

Mar 11, 2025

150

ByteDance Open-Sources COMET: A Technology Boosting Large Model Training Efficiency by 1.7x

ByteDance's Doubao large model team recently announced a breakthrough in addressing key bottlenecks in Mixture-of-Experts (MoE) architecture, open-sourcing a significant optimization technology called COMET. This technology dramatically improves large model training efficiency, achieving a remarkable 1.7x speedup and a 40% reduction in training costs. Image Note: Image generated by AI, image licensing provider Midjourney. COMET has been deployed in ByteDance's multi-thousand-GPU cluster training, resulting in millions of GPU hours saved.

Mar 10, 2025

1.6k

CoreWeave Acquires AI Development Platform Weights & Biases to Accelerate AI Innovation

Mar 5, 2025

130

Vivo Restructures, Establishes New AI Department and Shifts Large Model Training to On-Device

Mar 5, 2025

220

DeepSeek Open Source Release Day Four: Parallel Strategy Upgrade with DualPipe and EPLB Technologies Revolutionizes Large Model Training

Feb 27, 2025

490

Tencent Applies for Patent on 'Large Language Model Training Method' to Enhance Model Generalization and Accuracy

Feb 10, 2025

1.3k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview