Anthropic launches Claude Opus 4.5 on Amazon Bedrock, outperforming Sonnet 4.5 and Opus 4.1 in coding, AI agents, computer operations, and office tasks at one-third the cost of Opus series.....
Anthropic releases its flagship model, Claude Opus4.5, achieving world-leading levels in key productivity scenarios such as coding, intelligent agent operations, and computer usage, and also showing significant improvements in common tasks such as research and presentations. Core capabilities include reasoning and long-term task management, with exceptional performance in software engineering in real-world tests.
Saudi AI startup Humain launched the Humain One operating system at the Riyadh Future Investment Initiative. The system aims to replace traditional systems like Windows, supporting natural language interaction, allowing users to complete computing tasks through voice commands. The CEO stated that it will redefine enterprise computing and create an intelligent system capable of understanding human intent.
The UK has unique advantages in AI chips, potentially meeting 5% of global demand. With a strong history in chip design, including early computer innovations and Arm's leadership in mobile chips, it can play a key global role.....
Auron AI turns your computer into an intelligent companion, helping you manage tasks, automate operations, and communicate naturally.
Runable is a general-purpose automation agent that can automate any digital task that humans perform on a computer.
A virtual computer assistant that can perform tasks such as searching or creating images.
Vy represents the future of computer interfaces, using advanced artificial intelligence technology to change human-computer interaction.
Trilogix1
Fara-7B is an efficient small language model specially designed by Microsoft for computer usage scenarios. It has only 7 billion parameters and performs excellently in advanced user tasks such as web operations, competing with larger agent systems.
noctrex
Gelato-30B-A3B is a state-of-the-art (SOTA) model fine-tuned for GUI computer usage tasks, offering a quantized version to optimize deployment efficiency. This model is specifically designed to understand and process tasks related to graphical user interfaces.
microsoft
Fara-7B is a small language model developed by Microsoft Research, specifically designed for computer usage scenarios. It has only 7 billion parameters and achieves excellent performance among models of the same scale. It can perform computer interaction tasks such as web automation and multimodal understanding.
almanach
Gaperon-Young-1125-1B is a bilingual (French-English) language model with 1.5 billion parameters, developed by the ALMAnaCH team at the French National Institute for Research in Computer Science and Control (Inria Paris). The model is trained on approximately 3 trillion high-quality tokens, with a particular focus on language quality and general text generation ability rather than benchmark test optimization.
mlfoundations
Gelato-30B-A3B is a state-of-the-art foundation model for GUI computer usage tasks. It is trained on the Click-100k dataset and outperforms previous specialized computer foundation models and larger vision-language models in multiple benchmark tests.
timm
This is a vision Transformer model based on the DINOv3 framework, trained on the LVD-1689M dataset from the DINOv3 ViT-7B model through knowledge distillation technology. This model is specifically designed for image feature encoding and can efficiently extract image feature representations, suitable for various computer vision tasks.
This is a vision Transformer model based on the DINOv3 architecture, using a small configuration and trained through knowledge distillation on the LVD-1689M dataset. This model is specifically designed for efficient image feature extraction and supports various computer vision tasks such as image classification, feature map extraction, and image embedding.
Piero2411
This is a computer vision model based on the YOLOv8s architecture, specifically designed for barcode and QR code detection. The model has been fine-tuned on a comprehensive dataset containing more than 5000 images, supporting accurate detection and classification of multiple barcode types (such as EAN13, Code128, etc.) and QR codes.
macpaw-research
This is a computer vision model fine-tuned based on Ultralytics/YOLO11, specifically designed to detect UI elements in macOS application screenshots. It is part of the Screen2AX project, dedicated to generating accessibility metadata using computer vision technology.
logasanjeev
A powerful computer vision tool capable of classifying, detecting, and extracting text from Indian ID card documents.
lmstudio-community
An image-text to text generation model based on the Transformer architecture, designed specifically for computer/GUI-related scenarios, with intelligent agent capabilities.
Zeta-LLM
Zeta 2 is a small language model (SLM) with approximately 460 million parameters, meticulously crafted on consumer-grade computers and supports multiple languages.
Kar1hik
This model is fine-tuned based on the DINOv2 architecture for disease classification of skin lesion images
onnx-community
This is the ONNX format version of the facebook/dinov2-base model, suitable for computer vision tasks.
nvidia
The first hybrid computer vision model combining the strengths of Mamba and Transformer, enhancing visual feature modeling efficiency by reconstructing the Mamba formula, and introducing self-attention modules in the final layers of the Mamba architecture to improve long-range spatial dependency modeling.
MambaVision is the first hybrid computer vision model combining the strengths of Mamba and Transformer. It enhances visual feature modeling by redesigning the Mamba formulation and incorporates self-attention modules in the final layers of the Mamba architecture to improve long-range spatial dependency modeling.
The first hybrid computer vision model combining the advantages of Mamba and Transformer, enhancing visual feature modeling capability by reconstructing the Mamba formula
The first hybrid computer vision model combining the strengths of Mamba and Transformer, enhancing visual feature modeling efficiency through reconstructed Mamba formulas and introducing self-attention modules at the end of the Mamba architecture to improve long-range spatial dependency modeling.
mestrevh
This is a Vision Transformer (ViT) model fine-tuned on a legume dataset for identifying disease conditions in legume leaves.
ETH-CVG
LightGlue is an efficient keypoint detection and matching model for feature matching and pose estimation problems in computer vision.
Contains computer control and automation components for MCP servers
MCP-DBLP is a service based on the Model Context Protocol (MCP) that provides large language models with the ability to access the DBLP computer science literature database, including functions such as search, citation processing, and BibTeX export.
Showcasing the integration of computer vision tools with language models through MCP
An OpenAI agent server based on the MCP protocol, providing various professional agents (such as web search, file search, and computer operation) and a multi-agent coordinator, which can interact with clients (such as the Claude desktop application) through the MCP protocol.
A TypeScript - based MCP server for file system editing tools, ported from the Anthropic computer usage demonstration.
A computer vision server implemented based on Ultralytics and the MCP protocol, supporting functions such as object detection, image segmentation, and pose estimation
An MCP server for seamless integration with computer peripherals, providing a unified API to control, monitor, and manage hardware devices, including cameras, printers, audio devices, and screens.
A server based on the MCP protocol, providing the function of querying the prices of computer components on the CoolPC website in Taiwan and automatically generating computer configuration quotes.
The YOLO MCP Service is a powerful computer vision service that integrates with Claude AI through the Model Context Protocol (MCP), providing functions such as object detection, segmentation, classification, and real-time camera analysis.
A TypeScript-based MCP server for interacting with DAOs on the Internet Computer
Computer control and automation components of the MCP server
Desktop Commander MCP is a service that enables the Claude desktop application to execute terminal commands on the user's computer and manage processes through the Model Context Protocol (MCP). It provides terminal command execution, process management, file system operations, and code editing functions, supporting long-running commands and differential file editing.
An MCP service that allows Claude to control audio playback on the computer
An MCP server based on nut.js that provides comprehensive control functions for the computer screen, mouse, and keyboard, including screenshot, mouse operation, keyboard input, window management, and clipboard access.
An MCP server that provides computer control functions, including mouse and keyboard control, OCR recognition, window management, etc., implemented based on PyAutoGUI and RapidOCR without external dependencies.
Koppla is a natural language-based Active Directory management tool that enables querying and operating on user, group, and computer objects through an AI agent.
Claude Desktop Commander MCP is a server tool that allows the Claude desktop application to execute terminal commands and manage processes on the user's computer. It is built based on the Model Context Protocol (MCP) and provides file system operations and code editing functions.
Android-MCP is a lightweight open-source project that serves as a bridge between AI agents and Android devices. It enables real-world task operations such as app navigation, UI interaction, and automated testing through the MCP server, without relying on traditional computer vision or pre-set scripts.
An MCP server that provides computer control functions, including mouse and keyboard control, screen capture, OCR text recognition, etc. It supports cross - platform operation and requires no external dependencies.
An MCP server that provides audio input/output functions, supporting the interaction between AI assistants like Claude and the computer's audio system, including functions such as recording and playing audio files.