Performance Test of M4 MacBook Pro: 24GB Memory Challenges the Limits of Local AI

AIbase基地

Published inAI News · 5 min read · May 11, 2026

With the popularity of Apple's M4 chip, how to run large language models (LLMs) smoothly locally without relying on cloud computing has become a focus for developers. Recently, developer jola shared an in-depth practice of deploying a local AI workflow on a 24GB memory M4 MacBook Pro. Test results show that the optimized Qwen 3.5-9B model can generate 40 tokens per second, offering an efficient alternative for offline work and private development.

Model Selection: Why is the 9B Model the "Best Choice"

In the initial stages of model deployment, jola conducted a comparative evaluation of various popular solutions. The test list included models ranging from lightweight Gemma 4B to larger ones like GPT-OSS 20B, and the testing environment involved platforms such as Ollama, llama.cpp, and LM Studio.

Testing found that although models above 20B theoretically could fit into 24GB of memory, they were basically unusable in practice due to extremely high resource consumption. Meanwhile, smaller 4B models responded quickly but performed poorly in handling complex tool use tasks. Ultimately, Qwen 3.5-9B (Q4_K_S quantized version) stood out. This version significantly reduced memory load while maintaining reasoning capabilities, even leaving enough space for other development tools. More importantly, it supports a context window of up to 128K, which provides significant advantages for reading long documents or analyzing large codebases.

Tuning Details: Unlocking the Potential of Chain-of-Thought

To make the local model more "intelligent" in programming and logical reasoning scenarios, jola made fine-tuned adjustments to the inference parameters in LM Studio. By setting the Temperature to 0.6 and the Top_p value to 0.95, a balance was achieved between creativity and accuracy in responses.

Additionally, this solution specifically enabled the thinking mode. By manually injecting specific parameters into the Prompt template, the model performs a reasoning process similar to "self-thinking" before outputting the final answer. In terms of front-end integration, by calling the local API interface through tools like Pi and OpenCode, developers can flexibly configure context length and output limits, thus building a complete local AI assistant system.

Shift in Perspective: From "Outsourcing Assistant" to "Research Partner"

jola honestly pointed out in the report the gap between local models and top-tier cloud models (such as Claude or GPT-4). Even a 9B-scale local model still experiences distractions, logical loops, or semantic misunderstandings when performing multi-step complex tasks.

However, this limitation has actually fostered a more engaging working model. Unlike when using cloud models, where there is a tendency to outsource cognition, local models require users to provide clearer instructions and more rigorous guidance. In this interaction, AI plays the role of a "rubber duck" style research assistant with immediate memory capabilities, rather than a full-stack outsourcing tool.

For users who prioritize data privacy, zero subscription fees, and a controlled development environment, deploying this offline solution on an M4 MacBook is not only a technical attempt, but also a successful return to personal computing autonomy in the face of the trend toward black-box large models.

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

A technical report from XiYu Technology reveals that the M2 series model fails to accurately output specific names like 'Ma Jiaqi' due to a 'token offset' issue caused by the tokenizer. The model splits the name into 'Ma' and 'Jiaqi', compressing the vector space and causing recognition bias. This exposes a common yet subtle flaw in current large model training, affecting precise generation of specific names.....

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

DeepSeek V4Flash is a compact inference engine optimized for the Metal platform, delivering efficient and flexible local inference by tailoring execution for DeepSeek V4Flash models. Its advantages include speed enhancements and a unique thinking mode design, distinguishing it from general engines to maximize performance.....

Step Stars to Complete $2.5 Billion Financing and Remove Red-Chip Structure, Accelerating Its Hong Kong IPO Process

Chinese AI unicorn StepFun is nearing completion of a $2.5 billion funding round, dismantling its VIE structure to accelerate a Hong Kong IPO. The round attracts key industry players like Huaqin, Longcheer, OmniVision, and ZTE, spanning mobile manufacturing to semiconductor components, reflecting a trend of large model capabilities migrating to mobile devices.....

SpaceX Plans to Invest $55 Billion in Texas to Build an AI Chip Factory

According to The New York Times and CNBC, SpaceX plans to invest at least $55 billion in Austin, Texas to build an AI chip manufacturing plant called "Terafab" to apply for tax incentives. The project originated from a public hearing notice in Grimes County, and if the construction phase is added, the total investment could reach $119 billion. Elon Musk first announced this ambitious plan in March.

OpenAI Nexus Chip Project Faces Hurdles, Broadcom Reluctant to Invest Until Microsoft Promises to Purchase 40% of Initial Capacity

The OpenAI self-developed "Nexus" chip project has encountered obstacles, as Broadcom has demanded that Microsoft commit to purchasing 40% of the initial production capacity before it agrees to invest. The project aims to establish a 10-gigawatt computing cluster by 2030, aiming to reduce reliance on NVIDIA, but financing negotiations have led to a deadlock in the collaboration.

AI Daily: Qwen PC Client Launches AI Voice Input; ByteDance Releases Full-modal Large Model Doubao-Seed-2.0-lite; Google Updates AI Search Features

Welcome to the [AI Daily] column! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Qwen PC Client launches AI voice input function, allowing users to directly use it in various applications by opening their mouths, and users can access it via shortcut keys in various desktop applications, featuring powerful capabilities.

Mininglamp Opens Cider+Mano-P, Turning Your Mac into a Private AI Workstation

Mininglamp has open-sourced two local AI projects, Cider and Mano-P, addressing the pain points of AI inference acceleration on Mac and GUI agent operations. Cider unleashes the potential of M-series chips, making LLM/VLM run faster and more efficiently locally; Mano-P enhances the efficiency of agent operations. This upgrades your Mac from merely running AI to an efficient, private, and deeply controllable AI workstation, building a complete local AI infrastructure.

Google Gemma4 Speeds Up by 3 Times, the Era of Offline Large Models Has Truly Arrived

Google recently launched a Multi-Token Prediction (MTP) drafter for its open-source model Gemma4, leveraging speculative decoding architecture to boost inference speed by up to 3x while maintaining output quality and logical capabilities. Since its release, the model has seen rapid download growth, becoming one of the most popular open-source models globally.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Performance Test of M4 MacBook Pro: 24GB Memory Challenges the Limits of Local AI

AIbase基地

Model Selection: Why is the 9B Model the "Best Choice"

Tuning Details: Unlocking the Potential of Chain-of-Thought

Shift in Perspective: From "Outsourcing Assistant" to "Research Partner"

This article is from AIbase Daily

AI News Recommendations

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

Step Stars to Complete $2.5 Billion Financing and Remove Red-Chip Structure, Accelerating Its Hong Kong IPO Process

SpaceX Plans to Invest $55 Billion in Texas to Build an AI Chip Factory

Comprehensive Ban! Claude Desktop Tightens Restrictions, Third-Party Models Like DeepSeek V4 Can No Longer Be Directly Integrated

OpenAI Nexus Chip Project Faces Hurdles, Broadcom Reluctant to Invest Until Microsoft Promises to Purchase 40% of Initial Capacity

Qwen Desktop Version Launches AI Voice Input Method for a Smooth Voice Operation Experience

AI Daily: Qwen PC Client Launches AI Voice Input; ByteDance Releases Full-modal Large Model Doubao-Seed-2.0-lite; Google Updates AI Search Features

Mininglamp Opens Cider+Mano-P, Turning Your Mac into a Private AI Workstation

Google Gemma4 Speeds Up by 3 Times, the Era of Offline Large Models Has Truly Arrived

AI News Recommendations

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

Step Stars to Complete $2.5 Billion Financing and Remove Red-Chip Structure, Accelerating Its Hong Kong IPO Process

SpaceX Plans to Invest $55 Billion in Texas to Build an AI Chip Factory

Comprehensive Ban! Claude Desktop Tightens Restrictions, Third-Party Models Like DeepSeek V4 Can No Longer Be Directly Integrated

OpenAI Nexus Chip Project Faces Hurdles, Broadcom Reluctant to Invest Until Microsoft Promises to Purchase 40% of Initial Capacity

Qwen Desktop Version Launches AI Voice Input Method for a Smooth Voice Operation Experience

AI Daily: Qwen PC Client Launches AI Voice Input; ByteDance Releases Full-modal Large Model Doubao-Seed-2.0-lite; Google Updates AI Search Features

Mininglamp Opens Cider+Mano-P, Turning Your Mac into a Private AI Workstation

Google Gemma4 Speeds Up by 3 Times, the Era of Offline Large Models Has Truly Arrived