Scraper
Spider

About
Blog
@dbaman@fosstodon.org

Click ▶ to show/hide AI summary and keywords
Click The google logo

for Google search on keywords

2026-03-11 15:26

model context protocol

model context protocol stories from the last 14 days | Back to all stories

9. HN The Wiring Is More Dangerous Than the Weights

As of March 2026, the OpenGuard Team has identified that the primary threats in AI security stem from communication networks connecting multiple autonomous agents rather than vulnerabilities within single models themselves. This concern follows Microsoft Bing's integration of AI in 2023, which led to new methods for attackers to exploit systems through indirect prompt injection embedded in web pages, challenging deterministic retrieval systems' ability to differentiate these adversarial instructions from legitimate data. By June 2025, such security issues were classified under "protocol exploits," where corrupted snippets can alter an agent's internal state. Mitigating these threats involves using external classifiers like Llama Guard and preprocessing documents to remove code-like structures, though this method risks impairing complex reasoning capabilities in documents. Further research by STAC in September 2025 revealed the potential dangers of tool chaining within agents, where even basic tools can be manipulated to leak data if misused together, suggesting limitations on deterministic tools or incorporation of a verifier agent in more intricate systems. A significant security concern identified in December 2025 is memory retention, which poses persistent threats by holding malicious instructions across sessions. To address this, recommended safeguards include isolating user memories and conducting regular audits for adversarial artifacts. The focus for securing multi-agent systems has shifted towards enhancing communication networks that connect these agents rather than just individual models. Researchers propose adopting zero-trust architectures with explicit permissions and rigorous access governance to counter threats such as replay attacks and privilege inheritance, suggesting a paradigm where agents are viewed as programmable identities within secure infrastructures. This comprehensive strategy emphasizes authentication over mere model upgrades, focusing on safeguarding the entire AI workflow through robust permission management and zero-trust principles. Keywords: #phi4, AI security, adversarial instructions, agent isolation, communication vulnerabilities, indirect prompt injection, long-term storage, memory attacks, multi-agent systems, protocol exploits, retrieval systems, tool chains, zero-trust networks

model context protocol

openguard.sh an hour ago

33. HN Turnstone: Multi-node AI orchestration platform

Turnstone is an advanced orchestration platform engineered to deploy and manage AI agents across multiple servers, enabling the execution of tasks through various tools accessible via message queues or interactive interfaces. The platform's design draws inspiration from the Ruddy Turnstone bird, symbolizing its agility in managing Language Learning Models (LLMs) across different environments. Key features include support for interactive sessions using both terminal CLI and browser UI to handle concurrent workstreams, alongside queue-driven agents that streamline workflow initiation and management with comprehensive progress tracking and approval mechanisms. A significant strength of Turnstone lies in its multi-node cluster architecture, which optimizes resource utilization by distributing workloads across nodes or directing specific tasks to designated servers. It enhances operational oversight through a real-time cluster dashboard, providing visibility into all nodes and workstreams while enabling secure server UI access via reverse proxies without exposing the network directly. The platform emphasizes governance and compliance with robust role-based access control (RBAC), featuring 15 granular permissions across three roles, alongside policy management for tool usage, prompt templates, and detailed audit logging. Task distribution is efficiently managed through a Redis-based coordination system, prioritizing directed tasks over generic ones. Turnstone supports extensive tool integration, offering 16 built-in tools plus customizable external options via the Model Context Protocol (MCP), featuring automatic deferral and dynamic discovery mechanisms. Turnstone’s flexibility extends to supporting multiple models and providers, accommodating LLMs from OpenAI and Anthropic with configurations for multi-model use. The platform is accessible through both interactive CLI or browser sessions initiated by pip commands and a cluster dashboard setup via Docker-compose, ensuring ease of deployment alongside Redis and PostgreSQL options. Monitoring capabilities are robust, providing comprehensive metrics on usage, tool calls, workstream states, and system health via Prometheus-compatible endpoints, with additional safeguards like health checks, rate limiting, and circuit breakers to ensure stable operation. The technical requirements for Turnstone include Python 3.11+, a compatible API endpoint such as OpenAI or Anthropic, Redis, an optional PostgreSQL database for production environments, and Git LFS for diagram management. It is licensed under the Business Source License 1.1, transitioning to Apache 2.0 by March 1, 2030, with specific restrictions against offering it as a managed service. Overall, Turnstone presents itself as a scalable solution for AI orchestration, combining efficient workload distribution, extensive governance features, and comprehensive monitoring capabilities. Keywords: #phi4, AI orchestration, Docker deployment, LLMs tools, Multi-node, Prometheus metrics, Redis coordination, Turnstone, circuit breaker, cluster dashboard, governance compliance, interactive interfaces, message queues, rate limiting, role-based access control

model context protocol

github.com 3 hours ago

36. HN MCP Traffic Monitoring in NGINX

NGINX has launched an open-source Agentic Observability module designed to provide real-time insights into Model Context Protocol (MCP) traffic, thereby enabling operators to effectively monitor AI agent activities. This solution tackles the complexities of agentic workloads by standardizing observability for AI agents' interactions within distributed systems. The integration occurs directly in NGINX via its OpenTelemetry capabilities, which removes the necessity for additional proxy setups. The module's key features include monitoring throughput, latencies, errors, and providing comprehensive tracing at various levels such as agentic clients, sessions, MCP servers, and tools. It utilizes a reference implementation to export data to Prometheus, with visualizations available through Grafana dashboards. This functionality assists operators in pinpointing issues like high-latency tool calls, error trends within MCP servers, and patterns in agent throughput. The module enhances operational visibility into AI-driven traffic, thereby improving the security, reliability, and performance management of agentic systems without imposing additional setup burdens. NGINX plans to further develop these capabilities by integrating routing policies for AI traffic across its product suite, actively seeking feedback from the community on this innovative feature. Keywords: #phi4, AI Agents, Agentic Workloads, Docker Compose, Error Monitoring, Gateway API, Grafana, Inference Extension, Infrastructure Governance, Infrastructure Governance Keywords: MCP Traffic, JavaScript Module, Kubernetes, Latency, MCP Traffic, Model Context Protocol, NGINX, Observability, Open Source Module, OpenTelemetry, Prometheus, Real-Time Insights, Routing Policy, Throughput

model context protocol

blog.nginx.org 3 hours ago

66. HN Show HN: A fictional programmer's life, hour by hour – ask Claude via MCP

The "rows" tool is a command-line interface (CLI) program serving as both a text-based user interface (TUI) time tracker and an internal Model Context Protocol (MCP) server, simulating two years of detailed hourly life logs for a programmer. It captures comprehensive data across work at a tech company, side projects, personal activities, and more, encapsulated in 4,251 log entries. One of its main features is the absence of external dependencies, as it operates as a standalone binary. Additionally, the integrated MCP server allows AI tools like Claude to perform semantic searches on the tracked data using various parameters such as dates, categories, or keywords. The program offers a demo mode for users to explore sample data without inputting personal logs, accessible via `rows mcp install --demo`. Users can navigate and query their entries by day, week, category, or specific activities like dinners or gym sessions. Originally developed for logging every hour of the user's life since 2014, resulting in over 44,000 entries, "rows" supports real-time data interaction with keyboard shortcuts and automatic updates. The MCP server ensures continued availability across sessions, facilitating local semantic searches on encrypted notes when used alongside Claude Code. Keywords: #phi4, CLI binary, Claude Code, MCP server, TUI, Time tracker, categories, data entries, demo mode, encrypted notes, keyboard shortcuts, programmer's life, semantic search, time log

model context protocol

rows.life 5 hours ago

104. HN Turning my website into an MCP tool for AI agents

The article explores the innovative Model Context Protocol (MCP) concept designed to allow websites to expose their functionalities directly to AI agents, facilitating interactions beyond conventional scraping or API methods. Two primary approaches are examined: MCP-B and WebMCP in Chrome Canary. MCP-B involves using a browser extension as an intermediary between web pages and AI systems, demonstrated by the author's implementation of tools like newsletter subscriptions and article searches on their website. Meanwhile, Google’s experimental WebMCP introduces native browser support for similar capabilities without requiring extensions, streamlining architecture and enhancing user experience. The article posits that these advancements could transform websites from static content sources into dynamic platforms capable of direct AI engagement, akin to how JavaScript APIs standardized browser functionalities. Although still in an experimental stage, WebMCP signifies a pivotal move towards embedding AI capabilities directly within web environments, suggesting a transformative future for website development and AI interactions. Keywords: #phi4, AI agents, AI interaction, AI interaction Keywords: Web AI, AI-native web, Chrome Canary, DOM, JavaScript, MCP tooling, MCP-B, W3C community group, Web AI, WebMCP, browser environment, capabilities

model context protocol

ricmac.org 8 hours ago

121. HN Show HN: A simple hardened AI Docker cluster

The project presents a secure, containerized AI Docker cluster based on Zero Trust principles, designed to host AI agents with an emphasis on security through a sidecar architecture featuring TLS encryption and token-based authentication. The system's architecture comprises several key components: the Caddy Sidecar, responsible for SSL termination; the LangChain Server, which orchestrates interactions between language models (LLMs) and local tools; the LiteLLM Proxy, serving as the API gateway for LLM providers while managing egress credentials; and the MCP Server, ensuring a secure execution environment with restricted filesystem access. The network topology employs two Docker networks to maintain "Air-Gap" isolation, allowing services to communicate only within specified parameters. The security framework includes a unified trust chain where all services rely on an internal Root CA supported by shared certificates, and the MCP server uses os.OpenRoot to enforce filesystem jail restrictions against unauthorized actions like directory traversal. A dual-layer authentication approach is implemented, requiring both ingress and service tokens for access control, while HTTPS is enforced for all intra-cluster communications. The project structure incorporates microservices dedicated to routing, language modeling, and filesystem tools, complemented by scripts that manage initialization, testing, and operational tasks. Automation scripts like `run.sh` handle setup activities such as certificate generation and token rotation, alongside facilitating agent interaction tests. To ensure security and quality, the cluster leverages open-source tools including `pip-audit`, `govulncheck`, `hadolint`, and `trivy` to conduct thorough scans for vulnerabilities across Python libraries, Go modules, Dockerfiles, and infrastructure components. Overall, the project establishes a secure environment for AI agent operations, prioritizing robust isolation, authentication, and comprehensive auditing practices. Keywords: #phi4, AI, API Gateway, Auditing, Authentication, Caddy, Certificates, Cluster, Docker, FastAPI, Go, HTTPS, LangChain, Microservices, Orchestration, Proxy, Python, Secure, Sidecar, TLS, Vulnerability Scanning, Zero Trust

model context protocol

github.com 10 hours ago

133. HN OWASP Top Agents and AI Vulnerabilities

The document delves into security challenges posed by AI and agents, specifically examining vulnerabilities identified in the OWASP Top 10 for Language Model Systems (LLMs) and Agents. It categorizes these issues into four primary areas: Mixed Instruction and Data; Unpredictability and Agentic Threat Surface; Reliability and Cascading Failures; and provides strategic recommendations for each. The first category addresses how LLMs integrate instructions with data, resulting in vulnerabilities like Prompt Injection and Goal Hijacking, where attackers may alter AI behavior. Mitigation strategies include "Semantic Firewalls" and the enforcement of the Principle of Least Privilege. The second category focuses on the inherent unpredictability of LLMs and agents due to their non-deterministic nature, which presents risks such as Excessive Agency and Tool Misuse. To mitigate these risks, it suggests using Just-In-Time tokens, requiring Human-in-the-Loop confirmation for certain actions, and isolating code execution environments. In addressing reliability issues, the document highlights that multi-agent systems are susceptible to cascading failures stemming from a single fault. It recommends employing Zero Trust principles for communication between agents, cryptographic intent validation, and circuit breakers to prevent financial Denial of Service (DoS) attacks. The document advocates incorporating these insights into AI architecture through principles like Simplicity, Robustness, and Verifiability. It suggests treating LLM calls as stateless operations, sandboxing agentic functions, and ensuring systems are observable. Emphasizing that AI engineering parallels distributed systems engineering with unreliable components, it provides a structured approach to addressing these challenges. Additionally, appendices offer a cheat sheet for OWASP's Top 10 vulnerabilities specific to LLMs and agents projected in 2025 and 2026, detailing mitigation strategies such as semantic firewalls, sandboxing techniques, granular permissions, and mutual TLS. The document concludes by encouraging the dissemination of these insights and supports for ongoing content creation through community engagement and subscriptions. Keywords: #phi4, AI Vulnerabilities, Cascading Failures, Confidence Scoring, Cryptographic Verification, Data Poisoning, Emergency Kill Switches, Human-in-the-loop, Intent Capsules, JIT Tokens, LLMs, Micro-VMs, Namespace Segregation, Non-deterministic, OWASP, Privacy, Prompt Injection, Rate Limiting, Reliability, SBOMs, Sandboxing, Security, Supply Chain, Tool Misuse, Verifiability, Zero Trust, mTLS

model context protocol

blog.alexewerlof.com 14 hours ago

146. HN Semantically search 45k+ AI skills

The platform enhances user interaction through its semantically powered search feature, which interprets natural language to identify relevant AI skills from a vast array of over 45,000 options by understanding intent rather than relying solely on keywords. An upcoming Universal Install feature utilizing the Model Context Protocol is set to allow one-command installations for multiple AI agents like Claude Code and Cursor across various supported environments. To ensure user safety and trust, a multi-layer security scanning process will be implemented before publishing any skill, checking for prompt injection, malicious code, or suspicious behavior. Additionally, community reporting will serve as an extra layer of security, allowing users to flag potential issues, thereby enhancing the overall reliability and security of the platform's offerings. Keywords: #phi4, AI skills, Claude Code, Cursor, MCP, MCP (Model Context Protocol), Semantic search, Windsurf, community reporting, community reporting Keywords: Semantic search, intent, malicious code, natural language, prompt injection, security scanning, suspicious behavior, universal install

model context protocol

skillsgate.ai 17 hours ago

161. HN Build a "Deep Data" MCP Server to Connect LLMs to Your Local Database

The guide details the creation of a "Deep Data" Model Context Protocol (MCP) server that connects Large Language Models (LLMs) like Claude or Cursor with local databases using SQLite, Node.js, and TypeScript. The architecture comprises four key components: the Host (e.g., Claude Desktop), an MCP Client within the host, a local MCP Server acting as a bridge, and Local Resources such as SQLite databases. The setup involves creating a mock database with user entries, defining server tools for querying based on strict JSON schemas, and handling execution logic to interact with the database. Implementation begins by initializing a project, installing necessary packages like `@modelcontextprotocol/sdk` and `sqlite3`, creating a sample SQLite database, and writing TypeScript code in an `index.ts` file to establish the MCP server. The server is configured to define tools for querying users by role and manage execution logic with database interaction. After compiling the TypeScript code, the AI client (e.g., Claude Desktop) is configured to connect to this local server using a specified configuration file. Upon restarting the client, it can query about active Admins through the MCP tool. This setup allows LLMs to access, retrieve, and format data from the SQLite database effectively, enabling them to provide responses informed by the queried data. The entire process emphasizes secure local data access without needing custom REST APIs, highlighting efficiency in integrating AI with databases for enhanced functionality. Keywords: #phi4, AI Models, Deep Data, JSON Schema, LLMs, Local Database, MCP Server, Model Context Protocol, Nodejs, REST APIs, SQLite, Tools, TypeScript

model context protocol

root-ai.beehiiv.com 20 hours ago

187. HN Uber uses AI for development: inside look

Over recent years, Uber has been actively integrating artificial intelligence (AI) tools into its engineering processes to become a "GenAI-powered" company. At The Pragmatic Summit, former employees Ty Smith and Anshu Chada explained how Uber developed its internal AI stack, highlighting the importance of such an infrastructure for enhancing operational efficiency. The agentic system at Uber comprises four layers: their proprietary AI platform based on Michelangelo, access to Uber's contextual data (including code and documentation), industry tools like GitHub Copilot, and specialized agents designed for specific tasks. This setup aims to streamline engineering workflows by automating repetitive tasks through AI, thereby freeing engineers for more innovative work. To facilitate this integration of AI, Uber has developed several key tools: 1. **MCP Gateway**: Serving as a universal interface, it connects various data sources with AI agents while centralizing authentication and logging processes. 2. **Uber Agent Builder**: A no-code tool that enables developers to create agents capable of accessing Uber's internal resources and coordinating tasks among multiple agents. 3. **AIFX CLI**: An all-in-one command line interface for managing the deployment, configuration, and updates of AI agents. The transition from traditional software development workflows to those involving parallel AI agents has significantly altered developer routines at Uber. Engineers now manage several agents concurrently to boost productivity and efficiency. Despite facing challenges related to resource demands and increased costs associated with adopting AI technologies, a considerable portion of Uber's code is already generated by AI. This underscores the profound impact and potential of their strategy in transforming engineering processes within the company. Keywords: #phi4, AI stack, AI tools, AIFX CLI, Agent Builder, GenAI-powered, MCP Gateway, Minion, Uber, agentic systems, autonomous agents, background tasks Extracted Keywords: AI tools, background tasks Final Keywords: AI tools, background tasks Keywords: AI tools, code review, cost optimization, developer workflows, efficiency, engineering culture, hypergrowth, internal tooling, machine learning, parallel agents, platform strategy, software development

model context protocol

newsletter.pragmaticengineer.com 23 hours ago

220. HN Remote MCP Servers: Hosting, Authentication and Best Practices

The Model Context Protocol (MCP) functions as a standardized interface that facilitates the connection of AI systems with external tools and resources through interactions beyond their inherent training datasets using Remote Procedure Calls. This protocol operates like a "USB-C port" for AI applications, enabling seamless integration into various workflows. MCP supports both local and remote deployment environments: Local MCP Servers utilize the Studio Transport method on user devices, offering simplicity and low latency but lacking remote access capabilities. In contrast, Remote MCP Servers leverage Streamable HTTP to accommodate public use cases, supporting multiple clients and cloud-based deployments, requiring authentication mechanisms such as OAuth 2.1 for accessing private or sensitive data. Hosting options for MCP include self-hosting on platforms like Cloudflare Workers or opting for hosted solutions like kapa.ai that provide ready-to-use features along with analytics capabilities. To ensure secure and reliable operations, best practices suggest implementing token validation, rate limiting, meaningful error reporting, appropriate discovery endpoints, and a strategic approach to session management, which involves choosing between stateless and stateful methods. MCP plays a pivotal role in enhancing AI tools by integrating external functionalities, making it essential for expanding system capabilities especially in commercial or public environments where secure data access through authentication is often mandatory. This protocol thus supports the broadening of AI systems' operational scope while ensuring robust security measures are in place. Keywords: #phi4, API Key Auth, Authentication, Bearer Token, Best Practices, Cloudflare MCP Template, Cloudflare Workers, Discovery, HTTPS, Hosted Solutions, Hosting, JSON-RPC, LLMs, Large Language Models (LLMs), Linux Foundation, Local Transport, MCP, Model Context Protocol (MCP), Multi-Tenant, Multi-Tenant Environment, OAuth 21, OAuth Authorization Server, Prompts, RAG System, Rate Limiting, Reliability, Remote HTTP, Remote MCP Servers, Resources, SSE Transport, Security, Self-Host, Session Management, Streaming, Tools, Well-Known URI, Zero-Trust, Zero-Trust Scope Model, kapaaiKeywords: Remote

model context protocol

www.kapa.ai a day ago

240. HN MCP Roadmap

The updated Model Context Protocol (MCP) roadmap for 2026 outlines strategic priorities aimed at improving transport scalability, agent communication, governance maturation, and enterprise readiness. Since transitioning from a tool integration protocol to one that powers workflows in companies since its November 2025 spec release, MCP has incorporated community feedback into its evolution. The new approach shifts focus from release milestones to Working Groups organized around specific priority areas, recognizing the inherent uncertainties in open-standards projects. Key priorities include enhancing Streamable HTTP for horizontal scaling without state dependency and introducing standard metadata formats for better server capabilities discovery under Transport Evolution and Scalability. In Agent Communication, efforts are directed at refining existing features like Tasks to bridge lifecycle gaps identified through production feedback. Governance Maturation involves delegating SEP review authority to specialized Working Groups, thus alleviating bottlenecks while retaining strategic oversight from Core Maintainers. For Enterprise Readiness, the roadmap emphasizes addressing enterprise-specific issues such as audit trails and SSO integration, with a preference for extensions rather than core spec changes. The prioritization of SEPs aims to guide contributors toward focus areas for expedited review processes. Additionally, an "On the Horizon" section encourages exploration into other areas of active community interest, including security enhancements and event-driven updates. Active community involvement is promoted through participation in Working Groups or by proposing SEPs and extensions. Keywords: #phi4, MCP, SEP prioritization, SSO-integrated auth, Task primitive, Working Groups, agent communication, audit trails, enterprise readiness, extensions ecosystem, governance maturation, roadmap, transport scalability

model context protocol

blog.modelcontextprotocol.io a day ago

241. HN The indexing your database has is more important than many realize

This study investigates the effects of database indexing versus choosing different databases on performance when AI agents use databases through the Model Context Protocol (MCP). It reveals that indexing a database can significantly enhance performance, improving it 9-74 times more than merely switching between database engines, which only offers a modest gain of 2-4x. MySQL is highlighted for its exceptional efficiency out-of-the-box due to its InnoDB architecture, which naturally aligns with the access patterns typical in MCP workloads, thus minimizing the need for explicit indexes on foreign keys. The overhead introduced by using MCP itself is minimal, with median latencies staying under 1.2 milliseconds per operation, indicating it does not significantly hinder performance. The study also identifies an "optimization floor," where beyond basic indexing, further optimizations lead to diminishing returns because the MCP protocol's overhead becomes a larger component of total latency. In terms of concurrency and scalability in multi-agent architectures, middleware connection management is often more limiting than the database itself. Recommendations from this research suggest prioritizing indexing over switching databases for better performance gains and highlight that MySQL’s default settings are well-suited for typical MCP workloads. SQLite may be preferable for single-agent, read-heavy scenarios due to its architectural advantages. To encourage replication and further exploration, all benchmarking materials and results are made openly accessible as open-source resources. Keywords: #phi4, AI Agents, CRUD Operations, Concurrency Scaling, Database, Indexing, InnoDB Architecture, MCP, Middleware Tuning, Performance Benchmark, Query Optimization, Schema Discovery, Workload Profile

model context protocol

faucetdb.ai a day ago
https://github.com/faucetdb/mcp-db-benchmark a day ago

246. HN AI agent's API keys are sitting in plaintext

The "mcpguard" tool addresses a significant security concern where 53% of Model Context Protocol (MCP) servers store API keys in plaintext within configuration files, posing risks such as data breaches and unauthorized access due to their storage in version control systems and exposure online. To mitigate these vulnerabilities, "mcpguard" is designed as a command-line interface tool that replaces plaintext API keys with encrypted references stored securely in the operating system's keychain. The process involves auditing MCP configurations for plaintext credentials, migrating them to an encrypted vault, and substituting them with secure `mcpguard://` references to ensure runtime injection rather than disk storage. To use "mcpguard," users can install it via npm and perform a quick start by running commands to audit existing configurations and migrate any identified plaintext keys to the secure vault. The tool provides various commands for auditing, migrating, adding, listing, and checking the status of credentials within the vault. It employs a security model that leverages platform-specific keychains (macOS, Linux, Windows) or AES-256 encryption as a fallback, ensuring no plaintext secrets are written to disk, thus maintaining a local-first security posture without cloud sync. In comparison with other solutions such as plaintext storage or 1Password, "mcpguard" emphasizes automatic migration and secure OS-level storage. Its free access and planned future features like OAuth flows and rotation alerts distinguish it from its alternatives. The tool's roadmap includes expanding its capabilities to support OAuth flows, integration with additional tools, team vaults, and CI/CD systems. As an open-source project under the MIT License, "mcpguard" encourages developer contributions, inviting users to participate in its ongoing development via its GitHub repository for reporting issues or making enhancements. Keywords: #phi4, API keys, CLI tool, MCP config files, OS keychain, audit, credentials management, encryption, mcpguard, migrate, open source, plaintext, runtime integration, security model

model context protocol

github.com a day ago
https://apistronghold.com/blog/phantom-token-pattern-pr a day ago

333. HN LangWatch: OpenTelemetry-Native LLM Observability Without the Vendor Lock-In

LangWatch is an LLM observability platform leveraging OpenTelemetry to provide a vendor-neutral solution supporting portable instrumentation across any OTel-compatible system. It focuses on capturing OpenTelemetry spans for tracing operations within LLM applications, thus enabling comprehensive monitoring and optimization of these systems. Key features include adherence to the OTLP standard for compatibility with other tools, integration of the complete development loop, an agent simulation framework for pre-production testing of multi-step behaviors, and Model Context Protocol (MCP) integration facilitating direct evaluations from environments like Claude Desktop. The platform employs PostgreSQL for structured data storage, OpenSearch for trace querying, Redis for job queuing, and utilizes a Next.js frontend with a TypeScript backend. While self-hosting LangWatch offers full control over compliance with regulatory requirements, it also introduces operational complexity and demands significant resource management skills, particularly regarding OpenSearch. Pros of using LangWatch include its avoidance of vendor lock-in through open standards and providing an all-encompassing platform for LLM application development and evaluation. However, challenges arise from the need for familiarity with OpenTelemetry—a potential barrier for teams not already versed in it—and the complexities associated with self-hosting, which requires substantial infrastructure management expertise. In conclusion, LangWatch is well-suited for organizations developing production-level LLM applications that demand robust observability and systematic evaluation without relying on a specific vendor. However, it may not be ideal for rapid prototyping or entities dependent on existing observability ecosystems, lacking the resources to self-host, or requiring advanced enterprise compliance features beyond what LangWatch currently offers. Keywords: #phi4, Compliance Requirements, Docker Compose, Enterprise Features, Human Review, Instrumentation Code, Kubernetes, LLM Observability, OTLP Standard, OpenSearch, OpenTelemetry, PostgreSQL, Proprietary SDKs, Redis, Self-hosting, Success Criteria, Trace Execution Paths, Vendor Lock-In

model context protocol

starlog.is a day ago

347. HN Production MCP Server Starter Kit – Auth, Rate Limiting, AWS CDK, Docker

The Production MCP Server Starter Kit is a streamlined TypeScript-based starter for creating Model Context Protocol (MCP) servers, designed to facilitate the development of custom AI tools by enabling interaction with user-defined code such as database queries or API calls. The kit includes an example tool called "echo" that uses Zod for input schema validation and offers instructions for setting up with AI assistants like Claude and Cursor through configuration files. Users can initiate the project by cloning a GitHub repository, installing dependencies, and running the server in development mode with hot reload features. To add custom tools, developers follow a specific pattern outlined in `src/server.ts`. The free version provides basic features including stdio transport, while the Pro Starter Kit enhances functionality with production-ready templates for databases, APIs, file systems, web scraping, code execution, and dual transport options (stdio and SSE). Additional Pro features include authentication, rate limiting, structured logging, Docker deployment using AWS CDK, and a comprehensive test suite. The kit aims to expedite MCP server development by providing essential boilerplate and infrastructure for both rapid prototyping and production readiness, all under the MIT license with detailed setup guidance. Keywords: #phi4, AI Tools, AWS CDK, Auth, CLI Commands, Docker, Docker Compose, ESLint Prettier, Git Clone, Hot Reload, JWT Authentication, MCP Server, MIT LicenseExtracted Keywords: MCP Server, MIT LicenseKeywords: MCP Server, Nodejs, Production-Grade, Rate Limiting, SSE Transport, Starter Kit, Structured Logging, Tool Templates, TypeScript, Vitest Testing, Zod

model context protocol

github.com a day ago

379. HN Codex 101 Guide from a Recovering PM

The "Codex 101 Guide from a Recovering PM" offers comprehensive guidance on utilizing OpenAI’s Codex effectively, focusing on best practices like "Vibe Engineering." Basil Chatha, leveraging his experience in project management and AI consulting, advises setting up the Codex CLI for Mac users and emphasizes breaking projects into subcomponents to streamline development. The guide introduces the Model Context Protocol (MCP), which standardizes connections between large language models (LLMs) and external tools, overcoming previous integration challenges known as the "N x M" problem by simplifying integrations, reducing costs, and improving security through secure access to real-time data. For users implementing MCP, it is recommended to integrate one tool at a time, with Context7 and exa-code cited as viable options for including API documentation in Codex’s context. The VIBE Method (Verbalize, Instruct, Build, Evaluate) is outlined as an organized strategy for application development, underscoring the importance of separately developing and testing project components before full integration. Concluding with insights on multi-agent systems, the guide describes a setup where specialized agents collaborate under an orchestrator agent to efficiently tackle complex tasks, illustrated by the example of creating and securing a login feature in an app. The practical application is further encouraged through a lab exercise titled "Receipt Invoicing," which applies these concepts. Keywords: #phi4, AI Consulting, API integrations, Agentic Engineering, Codex CLI, Context7, Custom Prompts, Exa-Code, Mac Setup, Model Context Protocol (MCP), Multi-Agent System, N x M problem, OpenAI, Receipt Invoicing, VIBE Method, Vibe Coding, Vibe Engineering

model context protocol

www.forwardeployed.com a day ago
https://github.com/Nyrok/flompt a day ago

394. HN Hosted MCP server "everything" for testing

The "Everything MCP Server" serves as a comprehensive reference implementation hosted on Cloudflare Workers to demonstrate the capabilities of the Model Context Protocol (MCP). It offers a variety of endpoints designed for testing purposes, including functionalities such as echoing input, delivering annotated messages with specific priorities and audiences, serving a small image of an MCP logo, performing arithmetic operations like addition, and providing structured weather data output. The server further supports resource content blocks through endpoints that handle resource references and links, demonstrates progress reporting for long-running operations, and allows for periodic multi-level logging as well as managing resource subscription notifications. Dynamic text and blob templates are supported alongside static documents, such as "instructions.md" and "features.md." Additionally, the server facilitates various prompts including simple-prompt, args-prompt, completable-prompt, and resource-prompt. Built using Cloudflare Workers and the Agents SDK, its source code is publicly available on GitHub for further exploration and usage. Keywords: #phi4, Cloudflare Workers, GitHub, MCP server, Model Context Protocol, SDK, audience, auto-completing, content, documents, echo, embeds, endpoint, image, logging, messages, notifications, numbers, priority, prompts, resource, templates

model context protocol

servereverything.dev a day ago

395. HN The Missing Layer in AI Agent Architecture

The article underscores the critical need for a structured data layer within AI agent architecture, arguing that while protocols like the Model Context Protocol (MCP) facilitate tool connectivity and coordination, they fall short in addressing governance issues. It highlights that most enterprise AI failures are attributed to inadequate data management rather than protocol deficiencies. A robust system requires both a coordination plane—enabled by protocols such as MCP and A2A for agent interactions—and a data plane characterized by a structured, schema-driven layer essential for managing data access and relationships. The text critiques the current market's focus on protocols that neglects the vital aspect of governed data layers necessary for AI agents to effectively understand data relationships and constraints. This oversight can lead to security vulnerabilities and inefficiencies in system operations. The article proposes utilizing tools like GraphQL to establish an intelligent data plane, providing structure and governance over data access and integration across systems. The strategic recommendation is that enterprises should prioritize developing a well-structured data layer alongside investing in coordination protocols. Without this foundational element, AI capabilities are inherently constrained despite having robust connectivity solutions. To achieve true "AI-readiness," organizations must evaluate whether their MCP implementations rest on a coherent data model or merely consist of loosely connected endpoints. Keywords: #phi4, AI Agent, AI-Ready, Architecture, Coordination Plane, Data Access, Data Layer, Enterprise, Federation, Governance, GraphQL, MCP, Protocols, Schema-Driven, Security Incidents

model context protocol

wundergraph.com a day ago

415. HN Show HN: ContextForge now supports Cursor IDE – persistent AI memory

ContextForge enhances AI coding assistants by providing a persistent memory solution through its support of Cursor IDE via the Model Context Protocol (MCP), effectively addressing "AI amnesia" where past interactions and project details are forgotten between sessions. Users can now save knowledge across sessions, track tasks, organize projects, perform semantic searches, and collaborate with team members using this technology. To integrate ContextForge into Cursor, users must install MCP, obtain an API key from context.dev, and configure it through a JSON file. This setup allows for natural interaction with the AI assistant to manage information, tasks, and project links seamlessly. The memory layer is also compatible across other platforms like Claude Code and desktop applications. ContextForge offers a free tier that includes features such as support for one project, 50 knowledge items, semantic search capabilities, and task tracking. For users seeking more extensive functionality, there are upgrade options available. New users can sign up on context.dev and follow the installation guide to enhance their coding experience by reducing repetitive information input. This setup not only streamlines workflow but also facilitates better collaboration and efficiency in managing code projects. Keywords: #phi4, AI memory, API key, CLI, ContextForge, Cursor IDE, JWT tokens, Linux, Model Context Protocol (MCP), Windows, authentication flow, free tier, knowledge items, macOS, persistent storage, project linking, semantic search, task tracking

model context protocol

contextforge.dev a day ago

427. HN How to Build MCP Servers for Your Internal Data

This comprehensive guide outlines the process of developing production-grade Model Context Protocol (MCP) servers to facilitate seamless integration between AI applications and internal data sources such as databases and APIs. MCP standardizes tool discovery for AI models by acting as an intermediary, eliminating the need to hardcode logic into each application individually. The guide is structured around several key steps: 1. **Prerequisites**: Developers are expected to have a foundational understanding of TypeScript/Node.js, REST APIs, Large Language Models (LLMs), JSON-RPC, and server-side development. 2. **MCP Overview**: MCP enhances AI model connectivity with internal systems by defining interfaces for tool discovery, parameter validation, data access, response formatting, and authentication. 3. **Project Setup**: The process begins by initializing a Node.js project with TypeScript and installing dependencies such as Express, PostgreSQL (pg), and the MCP SDK. 4. **Building the MCP Server**: Developers create a server skeleton using `McpServer` to handle JSON-RPC protocols and lifecycle management. This includes connecting to internal data sources like a PostgreSQL database for employee and project information. Tools are defined to execute specific operations or queries, characterized by descriptive names, typed parameters with descriptions, and structured return values. 5. **Defining Resources**: Static and dynamic resources are exposed to provide AI models with background knowledge without invoking actions. 6. **Transport and Startup Configuration**: Implementing transport mechanisms like Streamable HTTP or Stdio is crucial for handling MCP requests during development and deployment phases. 7. **Authentication**: Various authentication methods, such as Bearer Token Authentication or OAuth 2.0, are implemented to restrict access to authorized users only. 8. **Scoping Data Access Per User**: Tools and resources are designed to respect user permissions by filtering database queries and redacting sensitive information based on roles. 9. **Connecting to Internal APIs**: Internal APIs are wrapped as tools with proper authentication headers, input validation, and error handling measures in place. 10. **Building a RAG Tool**: A vector search tool for documents is built using embeddings and similarity searches, accessible by AI models in a standardized format. 11. **Production Deployment**: The MCP server is Dockerized for efficient deployment, complemented with health checks, monitoring, and logging to maintain reliability and an audit trail of tool invocations. 12. **Connecting AI Clients**: AI clients like Claude Desktop or custom applications are configured using the MCP Client SDK to access and utilize tools provided by the MCP server. The guide also addresses common pitfalls such as overloading responses with excessive data, providing vague tool descriptions, neglecting error handling, and omitting rate limiting for tool calls. Developers are encouraged to start with high-value tools, like employee lookup or document search, gradually expanding based on real-world usage. Additionally, the importance of building an audit-logging mechanism is highlighted to track every tool call automatically, including user context and performance metrics. The guide emphasizes structured and secure access to internal data through well-designed tools and resources, ensuring AI applications can efficiently leverage this information while adhering to security and compliance standards. Instructions for connecting MCP servers with AI clients involve configuring HTTP transport with authorization headers, initializing client-server connections, discovering tools, and making tool calls using the `StreamableHTTPClientTransport` and `client.connect()` methods. To achieve production readiness, developers are advised to implement health checks, logging, and monitoring. The complete source code is made available on GitHub for further reference and implementation. Keywords: #phi4, AI applications, APIs, Docker, Express, JSON-RPC, LLMs, MCP, Nodejs, OAuth 20, PostgreSQL, REST, SDK, SQL queries, TypeScript, Zod, audit trail, authentication, circuit breakers, compliance, databases, health checks, logging, monitoring, multi-tenancy, rate limiting, schema validation, servers, streaming

model context protocol

www.freecodecamp.org 2 days ago

434. HN Show HN: Agentic CLI, Gideon Wins Nvidia GTC Golden Ticket for AI Innovation

Cogensec's AI agent, Gideon, has been recognized with a Golden Ticket to NVIDIA GTC 2026 for its innovative contributions to autonomous cybersecurity operations. Utilizing large language models (LLMs), Gideon automates critical tasks such as threat intelligence gathering, Common Vulnerabilities and Exposures (CVE) hunting, and Indicator of Compromise (IOC) analysis. Unlike traditional scanners, it functions as an autonomous agent capable of conducting deep vulnerability analyses, verifying IOC reputations, and generating security policies. Built on NVIDIA's AI infrastructure, Gideon employs technologies like NIM, Morpheus, PersonaPlex, NeMo, and RAPIDS to facilitate real-time threat detection, voice AI operations, enterprise safety measures, and enhanced data science capabilities. Gideon is characterized by its modular Skills architecture, which enables it to specialize in tasks such as bug bounty hunting and penetration testing. It seamlessly integrates with NVIDIA's suite of AI tools to bolster security through anomaly detection, domain generation algorithm (DGA) analysis, anti-phishing measures, and governance features like topic steering and audit logging. The agent draws support from diverse data sources and LLM providers, offering extensibility via Model Context Protocol (MCP) servers. Its straightforward configuration leverages the Bun runtime for easy integration of multiple AI models and security APIs without necessitating complex environments. Looking ahead, Gideon's roadmap includes future integrations with tools like ARGUS for enhanced agent governance, RAPIDS for batch analysis, and broader API connectivity options. The platform is designed with a strong focus on safety, employing query filtering and data redaction to ensure its operations remain strictly defensive and compliant with legal standards. Keywords: #phi4, AI Innovation, Agentic CLI, CVE hunting, Gideon, IOC analysis, LLMs, NVIDIA AI Stack, NVIDIA GTC, ReAct loop, autonomous agent, cybersecurity, defensive operations, security research, threat intelligence, threat intelligence Keywords: Agentic CLI

model context protocol

github.com 2 days ago

443. HN An Open Source SDK and Runtime for Building Agents

The Open Source SDK and Runtime is designed as a comprehensive toolkit for constructing high-performance agents using Rust. Central to its architecture is its async-first approach, leveraging Tokio to enable non-blocking I/O operations alongside a backpressure-driven event loop that manages load efficiently. Its multi-model framework supports over 75 providers and more than 500 models, facilitating seamless switching at runtime or on a per-session basis. The SDK's modular design ensures compatibility with various user interfaces through an event-driven API. The system offers advanced session management capabilities, providing isolated sessions complete with independent histories and lifecycle controls. For context management, it employs smart strategies such as threshold-based compaction to optimize performance. Tool management is notably flexible, allowing for the integration of custom tools defined via JSON schema while offering built-in functionalities like file operations and web search. Further enhancing its utility, the SDK supports the Model Context Protocol (MCP) for external integrations and provides real-time streaming responses with powerful markdown rendering capabilities. It also incorporates a widget system for diverse UI components, along with a robust permissions framework to maintain security. An advanced command framework is in place that includes slash commands, adding to its interactivity. The SDK encourages the use of agent skills for reusable behaviors, which aids in maintaining efficiency and consistency across operations. To ensure reliability and resilience, it incorporates error recovery mechanisms through graceful degradation and retries. Overall, this toolkit offers a robust solution for developing sophisticated and versatile agents with extensive customization possibilities. Keywords: #phi4, Agent Skills, Async, Command Framework, Context Management, Error Recovery, High Performance, MCP Protocol, Markdown Rendering, Modular Architecture, Multi Model, Open Source, Permissions Framework, Rust, SDK, Session Management, Streaming Responses, Tool Management

model context protocol

agent-air.ai 2 days ago

474. HN Show HN: I built a CLI that builds a knowledge graph from your code using LLMs

GZOO Cortex is a command-line interface (CLI) tool engineered for developers to construct a local-first, privacy-centric knowledge graph from their codebase using large language models (LLMs). It functions by monitoring directories containing files such as markdown, TypeScript, JavaScript, JSON, and YAML for changes. This enables the automatic extraction of entities and relationships pertinent to projects, facilitating natural language queries across different projects with source citations and compatibility with both cloud-based and local LLMs like Anthropic, Google Gemini, Groq, OpenRouter, or Ollama. The tool boasts several key features, including the ability to automatically extract project knowledge such as decisions, patterns, components, dependencies, constraints, and action items. Cortex infers relationships among entities and identifies contradictions across projects. Privacy is prioritized by ensuring that sensitive data remains local unless configured otherwise, with built-in mechanisms for detecting and blocking sensitive files from being sent to cloud services. The installation process involves using npm or cloning the source code, followed by initializing configuration settings for LLM providers, API keys, routing modes, directories to be watched, and budget limits. Users can register projects through commands and utilize various functionalities like monitoring file changes (`cortex watch`), executing natural language queries (`cortex query`), searching entities (`cortex find`), managing projects, handling contradictions, and adjusting configurations. Cortex’s architecture is organized as a monorepo comprising packages for core functionalities such as ingestion, graph storage (using SQLite and LanceDB), LLM integration, CLI interface, and web dashboard. It incorporates technologies like tree-sitter for parsing and Chokidar for file watching to enhance its operations. Originally developed by GZOO for maintaining context across client projects, Cortex is now open-sourced, aiming to aid developers in efficiently managing project knowledge with an accompanying web dashboard that enables users to explore the knowledge graph and manage queries visually. Keywords: #phi4, Anthropic, CLI, Chokidar, Cortex, D3, Google Gemini, LLMs, LanceDB, MCP server, Ollama, React, SQLite, developers, entities, file watching, knowledge graph, natural language, privacy, projects, relationships, semantic search, tree-sitter, web dashboard

model context protocol

github.com 2 days ago

537. HN Vigil – Open-source security ops with 6 scanners, AI agents, and MCP server

Vigil is an open-source security operations platform built with Express.js, offering a suite of tools for vulnerability scanning, incident response, compliance tracking, and more, all integrated into one process. It includes six built-in scanners: Nmap, Nuclei, Trivy, Nikto, OpenSSL, and DNS/WHOIS, each serving specific security functions like network scanning, vulnerability detection, container assessment, web server misconfigurations identification, SSL auditing, and DNS reconnaissance. The platform supports autonomous agents for parallel security campaigns with scheduling capabilities. Key features encompass comprehensive incident response workflows augmented by AI postmortems, compliance tracking across several frameworks (SOC 2, ISO 27001, NIST 800-53, PCI-DSS, HIPAA), a credential vault secured by AES-256-GCM encryption, and robust access control with two-factor authentication. Vigil's Bring Your Own Key (BYOK) AI integration allows users to incorporate their own Claude or Codex CLI tools for enhanced AI capabilities. Deployment options include npm on bare metal, Docker Compose, and standalone Docker containers, requiring Node.js 22+ and optionally PostgreSQL for data storage, though JSON file storage is also supported. The platform is designed for easy setup with minimal dependencies, excluding a build step or additional React components. Vigil's real-time updates are delivered through a glass-themed dashboard via Socket.IO, offering various views to manage security tasks such as threat intelligence, compliance policy management, and postmortem analysis. The architecture revolves around a server.js file using Express and Socket.IO with modules managing REST API endpoints, AI integration, and data operations. The platform is extensible with over 25 tools and includes a Model Context Protocol (MCP) server for AI clients like Claude Desktop or Cursor. As an open-source project under the AGPL-3.0 license, Vigil promotes community involvement through its website, GitHub, and Twitter channels, providing comprehensive documentation and support. Keywords: #phi4, AES-256-GCM, AI agents, Anthropic subscription, BYOK AI, CVE Tracker, Claude CLI, Codex CLI, DNS/WHOIS, Docker Compose, Expressjs, HIPAA, ISO 27001, JSON stores, MCP server, MITRE ATT&CK, NIST 800-53, Nikto, Nmap, Nuclei, OpenAI API, OpenSSL, PBKDF2, PCI-DSS, PostgreSQL, RBAC, REST API, SOC 2, SocketIO, TOTP, Trivy, Vigil, compliance tracking, incident response, scanners, security ops, vulnerability scanning

model context protocol

github.com 2 days ago

603. HN Show HN: Sentinel – Open-source MCP security scanner (config, probe, container)

Sentinel, developed by Helixar, is an open-source CLI and GitHub Action designed to identify security misconfigurations in Model Context Protocol (MCP) server setups, including configurations, live endpoints, and Docker containers. It features three scanning modules: Config Scanner (CFG), Probe Scanner (PRB), and Container Scanner (CTR), employing 26 detection rules with outputs available in terminal, JSON, SARIF, or HTML formats. Sentinel's capabilities encompass static analysis of server configuration files, security checks on live endpoints, and inspections of Docker containers. It integrates seamlessly into CI/CD pipelines through GitHub Actions, assigns severity ratings to findings, offers remediation guidance, and includes a "fail-on" threshold to block pull requests based on specified severity levels. Installation is straightforward via pip or cloning from the source, with quick start commands for various scanning tasks and comprehensive CI integration support through SARIF uploads. Future enhancements will introduce continuous monitoring mode, Kubernetes manifest scanning, additional probe checks, and result comparison across different runs. Sentinel is licensed under MIT and functions as an open-source alternative to Helixar’s runtime protection services. Keywords: #phi4, CI/CD pipelines, CLI, Docker containers, GitHub Action, GitHub Code Scanning, JWT algorithm, Kubernetes, MCP, SARIF, Sentinel, TLS configuration, authentication, container inspection, continuous monitoring, detection rules, live probe, misconfigurations, rate limiting, replay attacks, runtime protection, security scanner, static analysis, wildcards

model context protocol

github.com 2 days ago

608. HN Show HN: Goop-veil – software-only WiFi sensing defense research preview

Goop-veil is an open-source software tool developed as a preliminary defense against unauthorized WiFi sensing threats that leverage IEEE 802.11bf standards to detect human presence and vital signs through walls without consent. Recognizing that over 30 million homes possess hardware capable of such invasive monitoring, goop-veil aims to address these privacy concerns by offering software-based countermeasures deployable on consumer routers. The tool identifies potential sensing devices via unusual network patterns and employs techniques like traffic generation, power adjustments, and channel hopping to hinder their accuracy. Additionally, it creates evidence packages with timestamped logs for incident documentation. Although not a solution for compliance or certification, goop-veil provides technical support amidst regulatory gaps concerning private WiFi Channel State Information (CSI) sensing. Installation of goop-veil is straightforward via pip from GitHub, offering commands to scan networks, detect and mitigate threats, generate evidence reports, capture live traffic, and assess room vulnerability with material recommendations. It integrates into security frameworks like the Model Context Protocol (MCP), facilitating agent-driven defense strategies. Built on a Rust core for rapid 802.11 frame parsing, it also features a Python engine to evaluate threat levels based on network activity patterns. The software supports multiple routers through APIs, enabling reconfiguration to minimize sensing efficacy. Despite its innovative approach, goop-veil faces limitations in output quality and accuracy due to environmental variations and the specific router models used. Its effectiveness is further influenced by potential attackers' behaviors. Licensed under Apache-2.0, the project invites contributions that prioritize accuracy and evidence-based improvements. Overall, goop-veil marks an initial step toward addressing privacy issues linked to WiFi sensing technologies, offering tools for detection, mitigation, and documentation while underscoring the necessity for ongoing research and development in this domain. Keywords: #phi4, BroRL adaptive defense, ESP32 hardware, IEEE 80211bf, MCP integration, WiFi sensing, compliance-oriented guardrails, countermeasures, mitigation effectiveness, privacy defense, regulatory landscape, router reconfiguration, traffic orchestration

model context protocol

github.com 2 days ago

615. HN Agent Operating System

Agent Operating System (AgentOS) is an advanced operating system built around three core primitives: Worker, Function, and Trigger, providing a wide array of tools and capabilities that include over 60 tools, more than 2,500 tests, integration with 25 language model providers, and support for 47 models across 40 channels. Its architecture leverages the iii-engine, which is a framework-less bus system facilitating plain function registration without vendor lock-in, thereby offering flexibility in managing agents, memory, security, and workflows. The key components of AgentOS consist of Rust Crates, which handle core functionalities such as Role-Based Access Control (RBAC), audit chains, memory management, language model routing, and sandboxing. TypeScript Workers offer REST APIs, agent loops, workflow engines, tool registries, security mechanisms, and skill integrations. Additionally, a Python Worker is responsible for managing text embeddings using SentenceTransformers. AgentOS supports multi-agent swarm coordination through structured knowledge via a knowledge graph and allows session replay to aid in debugging. The system's design is polyglot, employing Rust for performance-critical tasks, TypeScript for rapid development iterations, and Python for machine learning functions. The control plane of AgentOS provides comprehensive agent orchestration capabilities like multi-tenant isolation, goal alignment, task management, and budget enforcement, backed by robust security features including fail-closed defaults, RBAC, mutual authentication, audit trails, taint tracking, tool policies, Docker and WASM sandboxes for prompt injection protection, rate limiting, loop guarding, and encrypted vaults. AgentOS is accessible via a Command Line Interface (CLI) and a Text User Interface (TUI) dashboard, with integration capabilities for various platforms like GitHub, Slack, AWS, and others. It supports multiple Language Learning Model (LLM) providers such as Anthropic, OpenAI, Google, among others. The project comprises Rust, TypeScript, and Python workers; agent templates; autonomous hands; Multi-Cloud Provider (MCP) integrations; channel adapters; and security components. Designed for extensibility and ease of use, AgentOS features a comprehensive testing suite covering TypeScript, Rust, and Python languages. It requires iii-engine version 0.3 or higher, Rust 1.75+, Node.js 20+, and optionally Python 3.11+. Licensed under Apache-2.0, the system is well-positioned for scalable and secure multi-agent applications. Keywords: #phi4, AgentOS, Approval Tiers, Architecture, Audit Chain, CLI, Channels, Configuration, Control Plane, Development, Docker, Function, Installation, Integrations, Knowledge Graph, LLM, LLM Providers, Loop Guard, Manifest Signing, Multi-tenant, Mutual Auth, Observability, OpenTelemetry, Orchestration, Polyglot, Project Structure, Python, Quickstart, RBAC, Rate Limiting, Rust, SQL Injection Prevention, Sandbox, Security, Security Gates, Sensitive Data Zeroing, Session Replay, SkillKit, SkillKit Integration, Swarms, TUI, Taint Tracking, Testing, Testing Frameworks, Tool Policy, Tools, Trigger, TypeScript, Vault, WASM, WebSocket, Worker

model context protocol

github.com 2 days ago

618. HN Show HN: OpenVerb – A deterministic action layer for AI agents

OpenVerb is an innovative project designed to establish a deterministic action layer for AI agents by decoupling reasoning from execution. It diverges from existing frameworks like LangChain or LangGraph, which concentrate on enhancing reasoning loops, by introducing an architectural model where actions are defined as structured protocols rather than straightforward tool calls or API requests. This involves articulating verbs with clear inputs, outputs, policies, and audit information to ensure standardized action execution across various domains including software systems, spatial configurations, and robotics. The project's architecture places the AI model/agent framework at the reasoning level while OpenVerb supplies a uniform protocol layer for executing actions, aiming to resolve common challenges such as custom integration code, inconsistent schemas, limited determinism, and issues related to auditing and policy enforcement. Conceptualized as a universal grammar for deterministic execution, OpenVerb seeks to bolster reliability across diverse fields. Although still in the experimental phase and at an early stage of development, OpenVerb is actively seeking community feedback from individuals interested in agent architecture or execution reliability. As an open-source initiative, it encourages contributions to aid its evolution while maintaining independence and accessibility. Keywords: #phi4, AI agents, API invocation, LangChain, LangGraph, OpenVerb, Reasoning Layer, System Execution, agent frameworks, architectural idea, audit information, community-first specification, deterministic action layer, deterministic execution, domains, execution policies, inputs outputs, open-source tooling, protocol layer, reasoning execution separation, robotics, software systems, spatial systems, structured verbs, tool calls, universal grammar

model context protocol

www.openverb.org 2 days ago

634. HN Show HN: Own your AI's context and memories across every model and device

The author has developed a centralized system for managing AI interactions across multiple models like ChatGPT, Claude, and Gemini, ensuring cohesive memory retention and data ownership. This architecture utilizes a knowledge graph stored in a Postgres database through Supabase, augmented with semantic search capabilities via pgvector. The setup consists of three layers: the Brain, which is a server storing the knowledge graph; the Gateway, a Node.js daemon on a VPS hosting multiple tools; and the Client, TypingMind, a Progressive Web App for accessing AI models. This arrangement allows users to maintain context across different AI services without resetting their memory when switching between them. The system's monthly operational cost is approximately $45 due to server and API expenses but grants full ownership of interaction data. Although it may not match the polish of commercial solutions like Claude.ai—evident in limitations such as restricted voice functionality and lack of iOS background process support—it allows users complete control over their AI interaction history. As each interaction enriches the unified knowledge graph, the system's value increases with use. This setup is designed not as a consumer product but rather as an effective management tool for those who prioritize data ownership and continuity in AI interactions across various platforms and devices. Keywords: #phi4, AI context, API compute, MCP server, Model Context Protocol, Postgres, Supabase, TypingMind, VPS, autonomous delegation, knowledge graph, memory management, pgvector

model context protocol

github.com 2 days ago

663. HN The case for running AI agents on Markdown files instead of MCP servers

The article explores the evolving landscape of knowledge management within AI agent systems, highlighting a shift from using Model Context Protocol (MCP) servers to utilizing Markdown files, referred to as "skill files." This transition is driven by the understanding that many challenges MCP implementations address—such as coding standards and company policies—are more effectively managed through structured documents. The advantages of skill files include their conciseness, compatibility with modern Large Language Model context windows, and reduced token consumption when compared to large MCP tool schemas, resulting in enhanced decision-making capabilities for AI agents. Operational efficiency is another significant benefit, as Markdown facilitates straightforward version control, swift updates via git-based pull requests, and minimized deployment risks relative to altering server code. The proposed two-layer architectural model delineates knowledge problems, which are best managed by skill files, from execution problems that remain under the purview of MCP servers. This separation capitalizes on the strengths of each component. The industry's adoption of this approach is evidenced by companies like CompanyOS, Supabase, Microsoft, and Anthropic already implementing it, signaling a broader move towards distinguishing domain knowledge from tool execution in AI systems. Practical recommendations for platform engineers include auditing existing MCP setups to identify candidates for conversion into skill files, ensuring that skills can operate independently of MCPs to enhance modularity and clarity. This trend underscores an architectural refinement aimed at developing more efficient, maintainable, and cost-effective AI systems, reflecting a strategic evolution in how knowledge is encoded and managed within these platforms. Keywords: #phi4, AI, AI agents, API, API access, Brad Feld, CompanyOS, GitHub CLI, MCP, MCP servers, Markdown files, agent architecture, domain knowledge, execution problems, git, git version control, knowledge problems, operational model, protocol war, skill files, token tax, tool execution, tool execution Keywords: Markdown

model context protocol

thenewstack.io 2 days ago

690. HN ChatGPT for Excel and new financial data integrations

OpenAI has launched ChatGPT for Excel in beta, a tool integrating GPT-5.4 into Excel workbooks, designed to enhance efficiency in building, updating, and analyzing spreadsheets by interpreting user requests in plain language. This innovation aims to streamline data analysis and decision-making processes while promoting consistency across teams. Additionally, new financial data integrations with platforms like FactSet and Dow Jones Factiva have been introduced, providing seamless access to reliable financial information within ChatGPT for tasks such as company research and due diligence. The advanced GPT-5.4 model powers this tool, significantly improving performance in finance-related tasks, including the construction of three-statement financial models. It supports comprehensive reasoning across large datasets, error tracing, and change explanations without requiring manual data reconciliation. However, during its beta phase, users may encounter occasional response delays and a necessity for manual output adjustments. Access to ChatGPT for Excel is currently regionally and user-type restricted but is set to expand to Google Sheets. OpenAI underscores security through stringent access management, robust encryption standards, and adherence to regional data regulations. Financial institutions using this tool have reported marked improvements in workflow efficiency, freeing up professionals for strategic engagements. OpenAI plans to continue refining these tools in collaboration with financial organizations while ensuring compliance with regulatory standards. Keywords: #phi4, AES-256, AI, API, ChatGPT, DLP, Excel, GPT-54, Model Context Protocol (MCP), RBAC, SAML SSO, SCIM, SIEM, TLS 12+, add-in, analysis, audit logs, auditing, automation, capacity, client engagement, code modernization, consistency, conviction, data integration, data residency, debate, enterprise, financial data, financial institutions, integrations, investment research, judgment, key management, market data, modeling, operations, productivity, proprietary data, regional processing, research, security, tools, underwriting, workflows

model context protocol

  openai.com 3 days ago
   https://www.sciencealert.com/excel-is-responsible-for-20-per   3 days ago
   https://www.qashqade.com/insights/the-worst-financial-s   3 days ago
   https://news.ycombinator.com/item?id=36197280   3 days ago

743. HN China's Agentic AI Controversy

The controversy surrounding China's "Agentic AI" centers on OpenClaw, an AI system integrated into smartphones such as the Doubao AI phone by ByteDance and ZTE. This integration has sparked debates over data security and privacy concerns due to OpenClaw’s extensive permissions that enable it to access multiple apps seamlessly without explicit user consent for each one. Consequently, major Chinese platforms like Alibaba's Taobao and Tencent's WeChat have blocked the Doubao phone, citing significant security risks. This situation underscores a larger conflict among tech giants over data control and commercial dominance in China's competitive market. Chinese consumers and experts express apprehension about how personal information is managed when AI agents can access multiple apps and services simultaneously. The incident has prompted discussions on regulatory intervention to balance innovation with user privacy protections, focusing on the need for new legal frameworks to govern agentic AI's interoperability and data handling practices. This also highlights fragmentation within China’s tech ecosystem. The concerns in China mirror similar issues emerging in the U.S., illustrating global implications for AI regulations. The evolving scenario suggests a shift toward establishing standards that ensure data security while fostering technological advancements, impacting both domestic markets and international expansion plans of companies like ByteDance. Keywords: #phi4, Agentic AI, Alibaba Cloud, Alipay, ByteDance, China Mobile, Doubao phone, GDPR, INJECT_EVENTS, Nubia M153, OpenClaw, Tencent, Tencent Cloud, WeChat, ZTE, accessibility services, antitrust law, cross-border data transfer, data security, hacking, interoperability, personal information, privacy, superapps

model context protocol

www.lawfaremedia.org 3 days ago
https://news.ycombinator.com/item?id=46916021 3 days ago

872. HN Show HN: The re-centralisation of AI Agents

The article explores the transition from decentralized AI systems, which utilized specialized agents for specific domains, to a centralized "Cognitive Core" architecture. Initially, domain-specific agents were preferred due to their specialization benefits. However, this approach led to inefficiencies known as "agent sprawl," since these agents shared similar core architectures. The evolution toward centralization is propelled by the Model Context Protocol (MCP), which facilitates universal tool integration, and Agent Skills that enable a single runtime with modular capabilities. The Cognitive Core architecture introduces a unified system focusing on dynamic context management through Just-in-Time (JIT) Context Hydration. It orchestrates tools and information relevant to specific tasks without embedding domain expertise from the start, enhancing efficiency by reducing "context rot" and optimizing operations in multi-step workflows. Although centralized systems are advantageous for sequential, interdependent tasks, distributed systems remain superior for parallelizable work. The shift to a Cognitive Core necessitates significant governance changes, particularly centralizing skill registry maintenance to enhance security and consistency. This change reflects an industry trend towards professionalized AI management rather than ad-hoc agent development, emphasizing context orchestration over traditional prompt engineering. The article highlights the broader implications of this transition, marking a move towards more sophisticated, efficient, and secure AI systems in handling complex tasks. Keywords: #phi4, AI Agents, AI Governance, Agent Skills, Centralized Architecture, Cognitive Core, Context Bloat, Context Engineering, Context Orchestration, Distributed Era, Governance, Just-in-Time (JIT) Context Hydration, Model Context Protocol (MCP), Multi-agent Systems, Orchestrator, Parallelizable Work, Re-centralization, Sequential Dependencies, Skill Drift, Skill Registry, Specialization, Technical Support Orchestrator Keywords: AI Agents

model context protocol

medium.com 4 days ago

895. HN Let's build a tool-using agent

The article explores the development of agentic AI systems that enhance large language models (LLMs) by enabling them to autonomously interact within real-world environments using various tools. Agentic AI broadens LLM capabilities beyond text generation to include dynamic, tool-based actions. This is achieved through a structure where tools act like API calls, allowing the model to perform specific tasks and engage with external resources. Key elements of this framework involve the role of wrapper code in managing how models communicate with tools by maintaining context for task progression or conversation history. The article highlights multi-round tool execution, which allows models to sequentially utilize tools for complex operations such as adjusting room temperature based on sensor data. Additionally, it introduces the Model Context Protocol (MCP) that facilitates interactions with external resources using JSON-RPC protocol, akin to how LLMs handle internal tools. Implementation involves defining tool capabilities and managing requests through wrapper code, enabling tasks like querying data or controlling devices per model instructions. A practical example is provided through a chatbot transforming into an agent capable of interacting with real-world tools, such as monitoring and adjusting room temperature. The conclusion underscores the potential of agentic AI to expand LLM functionality by integrating new tools without altering the core models, offering a versatile platform for creating intelligent applications. This approach allows developers to build functional agents that effectively bridge text generation capabilities with actionable interactions in dynamic settings. Keywords: #phi4, Agentic AI, HTTP API, JSON-RPC protocol, Model Context Protocol (MCP), Ollama, autonomous tasks, completion machine, deterministic behavior, dynamic environments, generative outputs, hosted model, large language models (LLMs), local model, tool calling, tool-using agent

model context protocol

educatedguesswork.org 4 days ago

927. HN Show HN: MCP Starter Kit – Production-Ready TypeScript Template for MCP Serve

The MCP Starter Kit serves as a robust TypeScript template designed to facilitate the development of Model Context Protocol (MCP) servers. By addressing common server setup challenges, such as transport management, error handling, and security, it allows developers to concentrate on constructing their tool's logic. The kit emphasizes security with features like protection against SSRF, DNS rebinding, JWT tampering, HMAC-SHA256 for webhooks, sandboxed file access, strict input validation using Zod schemas, and SQL injection prevention, having been tested against over 30 OWASP top threats. It is tailored for real-world applications with built-in authentication strategies (API Key and JWT), rate limiting through a token bucket algorithm, and structured JSON logging compatible with CloudWatch/Datadog. The developer experience is enhanced by its strict TypeScript configuration, an extensive testing suite encompassing 228 tests including security-focused cases, and Docker support for deployment. The kit includes reference implementations of various tools such as secure SQLite operations, REST API fetching, file system management, caching, semantic search, and webhook delivery. Getting started involves cloning the repository, installing dependencies, configuring environment variables, optionally seeding a sample database, building with TypeScript, and running a development server in hot-reload mode. It supports client integration with tools like Claude Code, Cursor, and Windsurf, providing detailed setup instructions. The project architecture is scalable and well-organized across directories for tools, middleware, transports, utilities, tests, scripts, documentation, Docker files, and sample data. Comprehensive guides cover setup, customization, deployment, architecture, troubleshooting, testing, and security policy. Additionally, the kit includes scripts for various operations such as starting the server in different modes, building, testing, linting, type-checking, database seeding, tool scaffolding, running tests with coverage reports, among others. Released under an MIT license by Edge Craft Studio, it is not affiliated with Anthropic or the Agentic AI Foundation. Keywords: #phi4, API Connector, Authentication, Dockerized, Documentation, GitHub Actions, JWT, MCP Starter Kit, Middleware, Nodejs, Observability, Production-Ready, Rate Limiting, SQLite, SSRF Protection, Sandboxed File Access, Scripts, Security, Semantic Search, Server Boilerplate, Testing, Transport ManagementKeywords: MCP Starter Kit, Type-Safe, TypeScript, Vitest, Webhook Signatures, Zod Schemas

model context protocol

github.com 4 days ago

939. HN Show HN: Micro Chat: Group Chat with AI

Micro Chat is a self-hosted, open-source group chat platform designed with AI integration at its core, specifically featuring Claude AI as an active participant within conversations. It supports real-time messaging and offers robust features such as channels and groups organization, user presence indicators, typing notifications, message reactions, threading, editing, deletion, and search capabilities—all while ensuring data privacy by avoiding API gatekeeping. The platform is built using the Go Micro framework, which enables a modular monolith architecture that facilitates scalable service management. It incorporates JWT authentication with bcrypt hashing and provides a RESTful API alongside WebSocket communication to enable real-time interactions. Claude AI can be queried directly within chats through mentions, utilizing context from the last 20 messages for relevant responses. The technology stack includes Go Micro v5 for microservices, SQLite for database management, JWT for secure user authentication, gorilla/websocket for live communications, and Anthropic's Claude API for AI functionalities. The platform is easily deployable with a pre-configured admin account and allows extensive customization through environment variables. Future development plans aim to expand the platform’s capabilities with features like invite systems, channel permissions, multimedia uploads, link previews, GitHub integration, data export functions, enhanced AI interactions via MCP, tool upgrades, custom system prompts for different channels, agent memory, web fetch tools, image analysis, plugin registries, semantic search, audit logging, SSO/OIDC support, and improved threading. The platform is distributed under an open-source license, as specified in the LICENSE file. Keywords: #phi4, AI-native, Anthropic API, Claude, Go Micro, JWT authentication, Micro Chat, REST API, WebSocket, group chat, modular monolith, real-time messaging, self-hosted

model context protocol

github.com 4 days ago

954. HN Show HN: I built an AI agent that wrote a full novel in 10 minutes

Gollem is an advanced AI agent framework crafted in Go, offering a type-safe environment with structured output capabilities. Distinct from many Python counterparts, Gollem emphasizes compile-time safety and zero-allocation streaming to eradicate runtime errors that could lead to production failures. The core features of Gollem include robust type safety with compile-time guarantees for schema generation, validation, and deserialization; support for multiple language model providers through a unified interface; input guardrails and output auto-repair mechanisms to preemptively tackle errors; and comprehensive observability with structured run traces and lifecycle hooks. Gollem enhances resilience and performance by incorporating retry systems, rate limiting, response caching, and execution timeouts. It also features cost control measures like tracking, quotas, and automated shutdowns. Advanced capabilities include support for multi-agent team swarms that utilize shared task boards and dynamic personality generation via LLM-generated prompts; model routing based on specific content or capabilities; and composable pipelines to handle complex tasks. The framework is designed with development ease in mind, providing quick start examples and detailed guides for production setup, including middleware integration. Core concepts focus on agents managing language model interactions and tools enabling Go functions to be called safely. Gollem supports structured output extraction from LLMs and offers varied streaming controls for real-time processing needs. The document further details capabilities such as model capability profiles for task-specific routing, dynamic prompt templates, and strategies for conversation memory management in prolonged dialogues. Agent composition allows cloning and chaining for complex tasks or multi-stage pipelines, while multi-agent swarms support concurrent operations via goroutines. Features like state snapshots, code mode (Monty) for script-based interactions, graph workflow engines, deep context management, and temporal durable execution enhance the framework's robustness. Gollem also includes an evaluation framework to measure agent quality, integrates with Model Context Protocol servers, offers middleware for cross-cutting concerns, provides testing tools without relying on actual language models, and showcases practical examples alongside Terminal-Bench leaderboard submission guidelines. Overall, Gollem stands out as a comprehensive solution for building scalable, efficient AI applications in Go, emphasizing reliability, performance, and adaptability. Keywords: #phi4, AI agent, Go framework, Gollem, MCP integration, agent cloning, caching, code mode, composition, contributing, conversation memory, conversation memory strategies, cost tracking, deep context management, dynamic personality generation, dynamic prompts, evaluation framework, graph workflow engine, guardrails, license, mailbox messaging, middleware, model capability profiles, multi-agent teams, multi-provider streaming, novel writing, observability, orchestration, performance, personality generation, pipelines, profile self-declaration, prompt templates, query model capabilities, rate limiting, resilience, retry backoff, route requirements, state snapshots, task board, team coordination, team swarms, temporal durable execution, terminal-bench submissions, testing, time-travel debugging, tool delegation, tracing, type-safe agents

model context protocol

  github.com 4 days ago
   https://a.co/d/037EOH88   4 days ago
   https://gist.github.com/trevorprater/0f940c7db0d5d018d2   4 days ago

968. HN Show HN: HyperClaw – self-hosted AI assistant that replies on Telegram/Discord/+

HyperClaw is a self-hosted AI assistant designed to offer robust functionality while maintaining user control over data by operating locally without reliance on cloud services. It supports communication across more than 28 messaging platforms, including Telegram, Discord, WhatsApp, and Slack, through a unified session model. Key features include real-time configuration updates via hot reload, built-in security audits, and the ability to handle direct messages securely with configurable policies. HyperClaw extends its capabilities by enabling PC access, voice interactions using text-to-speech (TTS), visual workspaces via live canvas, and sandboxed tool execution for enhanced functionality. The platform utilizes a Model Context Protocol (MCP) for managing model contexts across different sessions, ensuring seamless integration and interaction. Installation is straightforward with npm, allowing global setup followed by an interactive configuration wizard that covers AI providers, models, channels, and skills. Its architecture is built around a Gateway responsible for session management, authentication, routing, tools, and webhooks, supporting OpenAI-compatible APIs like Anthropic's Claude or OpenRouter. HyperClaw prioritizes security, treating inbound direct messages as untrusted by default and requiring pairing codes for approval unless configured otherwise. It supports Docker sandboxing to provide isolated execution environments, along with comprehensive documentation available for setup guides, configuration references, and deployment strategies. The community actively engages through GitHub Discussions and Issues, fostering support and feature discussions. Open-source under the MIT license, HyperClaw invites contributions and responsible security vulnerability reporting, encouraging users who find it useful to star its repository. Overall, HyperClaw offers a flexible, secure AI assistant platform that empowers users with comprehensive control over their data interactions across multiple platforms. Keywords: #phi4, AI assistant, Discord, Docker, HyperClaw, MIT license, Nodejs, Telegram, configuration hot reload, ethical hacking, local-first gateway, macOS/iOS/Android support, multi-agent routing, open-source, privacy control, sandboxing, security audit, self-hosted, voice commands

model context protocol

github.com 4 days ago

982. HN Show HN: OpenEHR-CLI – CLI and MCP server for working with openEHR artifacts

OpenEHR-CLI is an open-source command line tool crafted to streamline the management of openEHR artifacts, such as archetypes and templates. It aims to replace GUI-based tasks with automated solutions, facilitating template validation, resource processing in scripts, and Continuous Integration (CI) pipelines. A distinctive feature of OpenEHR-CLI is its Model Context Protocol (MCP) server, which empowers AI clients supporting MCP—like Claude Desktop or Cursor—to interact programmatically with openEHR artifacts. The tool offers several key functionalities: it validates operational templates (OPTs) against schemas and allows for the inspection and generation of instances from OPTs in various formats. Additionally, OpenEHR-CLI can transform data between XML and JSON formats and generate user interfaces from OPTs using Bootstrap. Built with Gradle, setting up the CLI requires installing dependencies, compiling the tool, and registering it with an MCP-compatible client. This setup facilitates integration with AI assistants to execute tasks such as template validation or instance generation through conversational prompts. As an open-source project hosted on GitHub at [CaboLabs/openEHR-CLI](https://github.com/CaboLabs/openEHR-CLI), the tool invites user feedback and contributions, promoting collaborative enhancement and innovation in working with openEHR artifacts. Keywords: #phi4, ADL archetypes, AI clients, Bootstrap, CI pipelines, CLI, Claude Desktop, Cursor, GUI tools, JSON, JSON-configured clients, MCP server, Operational Templates, Python dependencies, XML, XSD schema, archetypes, artifacts, clinical instances, format transformations, openEHR-CLI, semantic validation, synthetic clinical instances, templates, virtualenv

model context protocol

github.com 4 days ago

993. HN GoldRush Agent Skills for blockchain data and pricing

The GoldRush MCP Server is designed as a Model Context Protocol server that facilitates AI coding agents with seamless access to an extensive suite of over 27 blockchain data tools. This server supports various compatible agents such as Claude Code, Cursor, and Copilot by allowing them to efficiently retrieve detailed information across more than 100 blockchain networks. Users can obtain valuable insights on token balances, transaction histories, decentralized exchange (DEX) data, non-fungible tokens (NFTs), and additional blockchain-related data, thereby enhancing the agents' capability in navigating complex blockchain ecosystems effectively. Keywords: #phi4, AI coding agents, Agent Skills, DEX data, GoldRush, MCP Server, Model Context Protocol, NFTs, blockchain, chains, pricing, token balances, tools, transactions

model context protocol

goldrush.dev 4 days ago

997. HN Show HN: WebBridge turns any website into MCP tools by recording browser traffic

WebBridge is an innovative tool designed to convert any website into Model Context Protocol (MCP) tools by capturing browser traffic through a Chrome extension, developed by an engineer utilizing AI for productivity enhancement. Its primary function is to simplify automation processes for non-technical users in various organizational roles such as legal analysts and market researchers. The workflow begins with installing the Chrome extension, navigating to a site where one is logged in, and using the "Record" button within the extension to capture actions desired by the user. After stopping the recording, Claude—an AI tool—analyzes the captured API traffic to create a permanent MCP server that integrates seamlessly with MCP-compatible clients like VS Code or Cursor, enabling interaction without coding expertise. WebBridge offers numerous features tailored for diverse applications such as public library searches, legal compliance audits, and privacy tracking audits. In its Full Dump mode, it provides structured privacy reports detailing data sharing and third-party interactions on websites. Notably, the tool is designed to operate effortlessly with various MCP clients and can import HAR files from any browser, enhancing its functionality. However, users should be aware that employing WebBridge may contravene website terms of service, implicating legal risks for which they assume responsibility. The installation involves several steps: enabling Developer Mode in `chrome://extensions`, installing the Native Host through provided scripts, and using npm commands to install the WebBridge MCP Plugin. Licensed under AGPL-3.0 with a Commons Clause condition, WebBridge restricts commercialization without permission. Thus, users must ensure compliance with all applicable laws and terms of service when utilizing the tool. Keywords: #phi4, API traffic, Chrome extension, Claude AI, MCP tools, Model Context Protocol, WebBridge, automation, full dump, legal compliance, native host, privacy audit, recording mode, tech stack

model context protocol

github.com 4 days ago

999. HN Java beats Go, Python and Node.js in MCP server benchmarks

The benchmark study evaluated Model Context Protocol (MCP) server implementations in Java, Go, Node.js, and Python by testing them with 3.9 million requests across three rounds to assess latency, throughput, resource efficiency, and reliability. Java and Go emerged as top performers, displaying sub-millisecond average latencies (~0.835ms for Java and ~0.855ms for Go) and throughputs exceeding 1,600 requests per second (RPS). Notably, Go demonstrated superior resource efficiency, utilizing only 18MB of memory compared to Java's 220MB while maintaining similar performance levels. Node.js showed higher latencies (~10.66ms) and lower throughput (~559 RPS), making it suitable for development or low-traffic production environments. Python underperformed with an average latency of 26.45ms and a throughput of only 292 RPS, primarily due to the Global Interpreter Lock (GIL) affecting CPU-bound tasks. Despite these differences, all implementations maintained a 0% error rate, indicating robust protocol compliance. The study recommends using Go for high-load production environments due to its optimal balance between performance and resource efficiency, while Java is best suited when achieving the lowest possible latency is crucial. Node.js could be employed in moderate-traffic scenarios if there is expertise with JavaScript/TypeScript available, but Python should only be considered for development or low-traffic use cases because of its limitations. The findings are based on specific configurations such as a security-hardened Node.js setup and single-worker Python configuration, suggesting that future studies might explore alternative Java runtimes, optimized multi-worker Python setups, and shared-instance Node.js architectures to further investigate performance potential. All test data was made available for reproducibility and additional analysis. Keywords: #phi4, Docker, Go, Java, MCP, Nodejs, Python, benchmarks, concurrency models, k6, latency, load testing, memory management, performance analysis, resource efficiency, scalability, throughput

model context protocol

www.tmdevlab.com 4 days ago

1006. HN Show HN: Flompt – Visual prompt builder that decomposes prompts into blocks

Flompt is an advanced tool designed to enhance AI prompt creation through a structured visual approach. It transforms raw text prompts into meticulously organized components, using a web application, browser extension, and MCP server tailored for Claude Code. Flompt's functionality includes breaking down prompts into 12 distinct typed blocks—such as role, context, objective, and constraints—and compiling these into XML formats optimized for AI models like Anthropic’s Claude and OpenAI’s GPT. The tool offers a React-based web app interface utilizing React Flow canvas, along with browser extensions compatible with popular platforms such as ChatGPT, Claude, and Gemini. It supports seamless integration in development environments through direct tools in Claude Code via Model Context Protocol (MCP), enabling native command execution for prompt management. Flompt’s technical foundation comprises a technology stack involving React, TypeScript, FastAPI, and Caddy, facilitating full-stack deployment from backend to frontend components. Deployment is efficiently managed with Caddy serving as a reverse proxy and SSL handler, while supervisord manages process execution. This tool supports customization by allowing users to specify AI models through environment variables, with a heuristic fallback when no API key is available. Furthermore, Flompt offers internationalization support in 10 languages, providing tailored indexed pages for each language. As an open-source project under the MIT license, Flompt requires no account creation and allows local persistence using Zustand. Its integration capabilities significantly streamline the process of writing and optimizing AI prompts, offering a visual interface to effectively structure prompt components. This makes it particularly beneficial for developers and researchers working with AI models like Claude and GPT, enhancing productivity by providing direct tools within popular AI platforms. Keywords: #phi4, AI prompts, AI prompts Keywords: Flompt, Anthropic, Claude Code, Claude-optimized XML, FastAPI, Flompt, MCP server, React Flow, TypeScript, blocks, browser extension, decompose prompts, visual prompt builder

model context protocol

github.com 4 days ago

1041. HN Let's build a tool-using agent

The document provides a comprehensive guide on developing an agentic AI tool that leverages large language models (LLMs) to perform dynamic interactions with the environment through external tool integration. It begins by distinguishing agentic AI from generative AI, emphasizing its unique capability of executing tasks via LLMs in combination with diverse tools. The article outlines practical methods for constructing such agents, detailing both local and hosted model implementations. Central to this development is enabling LLMs with tool definitions that function analogously to traditional programming functions, facilitating real-world actions like web searches or travel bookings. These tools are defined through JSON specifications, allowing the LLM's outputs to direct an agent wrapper code to execute these calls. The process starts with crafting a simple chatbot and gradually integrates tool capabilities, illustrated using JavaScript examples that maintain context across interactions for stateful conversations. The document further explains how to manage multiple tool executions for intricate tasks, such as operating a thermostat system, and introduces model context protocols (MCP). MCP extends the AI's interaction with external resources beyond basic tool calls by enabling more complex engagements, like accessing server-side data or functionalities. Ultimately, the article demonstrates how agentic AI merges LLMs' text generation prowess with deterministic agent wrapper code and customizable tools to develop robust, interactive systems capable of executing sophisticated tasks independently, highlighting the approach’s modularity and scalability for easy expansion through additional tool integration or advanced models. Keywords: #phi4, Agentic AI, HTTP API, JSON-RPC protocol, Model Context Protocol, Model Context Protocol (MCP), Ollama, autonomous tasks, chatbot, context variable, deterministic agent wrapper Extracted Keywords: Agentic AI, deterministic agent wrapper Keywords: Agentic AI, dynamic environments, generative outputs, hosted model, large language models, large language models (LLMs), local model, parameters, server-side resources, stateless model, tool calling, tool definitions, tool-using agent

model context protocol

educatedguesswork.org 5 days ago

1108. HN The Future Is SaaaS (Subagent as a Service)

The article outlines the transition from traditional Software as a Service (SaaS) models to Subagent as a Service (SaaaS), driven by advancements in AI and autonomous agents. This evolution involves moving away from human-centric interfaces towards systems where specialized subagents autonomously perform specific tasks, signaling a significant paradigm shift. The progression is marked by three phases: the initial SaaS era emphasizing dashboard interaction, followed by APIs that reduced manual operations while maintaining determinism, and finally reaching the SaaaS stage which focuses on goal-oriented tasks through continuous communication streams. In this new model, companies like Salesforce evolve into specialized AI systems capable of executing tasks based on natural language goals set by orchestrators. This eliminates human-managed error handling in low-level operations as domain-expert subagents take over these responsibilities. The competitive advantage lies in possessing deep domain expertise (Ultra-Specialists), exceptional routing and discovery capabilities (Connectors), access to proprietary data (Gatekeepers), and reliable execution (Operators). To support this transition, essential infrastructures include full-duplex communication, agent identity systems, billing protocols, a dynamic discovery layer, sensitive data protection measures, and robust execution frameworks. The Runtime Evaluator plays a crucial role in ensuring the reliability and trustworthiness of subagent actions. The shift to SaaaS alters business models from focusing on user engagement to emphasizing outcome delivery, akin to professional services pricing based on results rather than time spent. This necessitates delivering measurable outcomes efficiently and accurately for success. In conclusion, companies that adopt the necessary infrastructure early will gain substantial advantages in a SaaaS-driven economy. Future enterprise success depends on adapting by leveraging specialized capabilities, reliable execution, and outcome-focused services within an agent-centric framework. Keywords: #phi4, AI agents, APIs, CLIs, MCPs, PII guards, SaaS revenue model, Subagent, agent network protocol, billing protocols, competitive advantage, discovery layer, durable execution, ephemeral authentication, full-duplex communication, infrastructure gaps, interoperability, microservices, orchestrator, runtime evaluator, software integration, specialization

model context protocol

jainnivedit.substack.com 5 days ago

1119. HN Show HN: KinBot – Self-hosted AI agents that build their own web apps

KinBot is a self-hosted AI tool designed to offer persistent memory and autonomous capabilities through its agents known as "Kins." These Kins retain all interaction history indefinitely, enabling them to build on past conversations without losing context. Each Kin possesses a unique identity defined by attributes such as name, role, personality, and avatar, enhancing personalization. The key features of KinBot include persistent memory supported by vector search and full-text capabilities across interactions, which allows for long-term retention of information. Kins can collaborate through task delegation and communication, facilitated by an architecture that supports cron jobs, webhooks, and integration with various messaging platforms like Telegram, Discord, Slack, WhatsApp, Signal, and Matrix. KinBot prioritizes data privacy and security, ensuring all user data remains on the server without being transmitted externally. The tool is highly extensible through a plugin system, allowing users to integrate custom tools, AI providers, channels, and mini-apps. It supports English and French languages and offers customizable UI themes and palettes. The architecture of KinBot involves handling operations in a single process with SQLite for data storage. It provides features such as multi-agent collaboration, an encrypted secrets vault, and webhook integrations. Users can install KinBot either via Docker or through manual setup. Compared to other AI tools, KinBot distinguishes itself with its self-hosting feature, persistent agent identity, long-term memory capabilities, encryption of sensitive data, and extensive extensibility options through plugins and mini-apps. As an open-source project under the GNU AGPL-3.0 license, KinBot ensures users can freely use and modify it while mandating that source code is available for network services. Commercial licensing arrangements are available upon request. Keywords: #phi4, AI, AI agents, KinBot, autonomy, channels, collaboration, customization, design system, design system Keywords: KinBot, encryption, extensibility, mini apps, multi-agent, open source, persistent, persistent memory, plugins, privacy, security, self-hosted, webhooks

model context protocol

github.com 5 days ago
https://github.com/MarlBurroW/kinbot 5 days ago

1132. HN Show HN: mcp-recorder – VCR.py for MCP servers. Record, replay, verify

The **mcp-recorder** tool developed by Vlad serves as a solution for testing Model Context Protocol (MCP) servers by capturing their interaction sequences in JSON cassette files. This allows for deterministic behavior testing to identify issues such as silent breaks due to parameter changes or renames, which are crucial for AI agents relying on these schemas. Its key features include recording interactions into cassettes and using them to replay mock server scenarios for client-side tests without needing a live server. The tool also verifies current server behavior against recorded responses to detect regressions. Scenarios in **mcp-recorder** are defined using a straightforward YAML format that supports integration across different programming languages, enhancing the coverage of tool surfaces. There is also a pytest plugin available for seamless incorporation into Python test suites. Additionally, it ensures privacy by redacting sensitive information like API keys from recordings while maintaining test integrity. The tool is compatible with continuous integration and deployment workflows through GitHub Actions, allowing automated testing without live server dependencies during CI processes. Vlad has demonstrated its effectiveness in production environments by achieving full schema verification and enhanced regression detection. Released as open-source under the MIT license, **mcp-recorder** invites community contributions for ongoing development and improvement. Keywords: #phi4, HTTP transport, JSON cassette, MCP servers, VCRpy, YAML scenarios, mcp-recorder, pytest plugin, regression testing, replay server, schema drift, stdio transport, tool parameter, verification

model context protocol

github.com 5 days ago

1135. HN LocalCowork

LocalCowork is a desktop-based AI agent designed to function entirely offline, providing tool-calling capabilities directly from local devices without cloud reliance. It leverages LFM2-24B-A2B technology, optimized for efficient tool deployment with minimal latency and memory consumption. The system's architecture is built on Tauri 2.0 using Rust, complemented by React/TypeScript, and it incorporates an OpenAI-compatible API for inference tasks. The platform supports a variety of tools distributed across 14 MCP servers, facilitating functions such as filesystem management, document processing, OCR, security scanning, and task management. These capabilities allow users to perform operations locally with minimal latency, including scanning for exposed secrets, document comparisons without cloud access, and conducting local file searches. LocalCowork's modular architecture simplifies the integration of additional tools or MCP servers. Security and efficiency are prioritized through a local audit trail logging every tool execution. Future enhancements aim to incorporate user confirmation systems to ensure action accuracy before execution. Benchmarks indicate that LFM2-24B-A2B achieves high tool accuracy with reduced latency compared to other models, owing to its hybrid design and MoE sparsity. Despite these strengths, challenges persist in handling complex multi-step workflows and cross-server transitions. The project offers comprehensive setup guides, customization documentation, testing procedures, and architectural insights under an MIT license. While it currently faces limitations in managing intricate workflows, LocalCowork aspires to provide a dependable, interactive AI tool dispatching experience on consumer hardware. Keywords: #phi4, AI agent, GPT-OSS-20B, HuggingFace, LFM2-24B-A2B, LocalCowork, MCP, MCP servers, MIT licenseKeywords: LocalCowork, Mistral-Small-24B, Model Context Protocol (MCP), OCR, OS APIs, OpenAI API, OpenAI-compatible API, PDF generation, PII/secrets scanning, Python, Qwen3, Rust, Tauri, TypeScript, audit trail, benchmarks, clipboard, document processing, dual-model orchestrator, email drafting, encryption, failure taxonomy, file CRUD, filesystem operations, ics parsing, inference layer, latency, memory, plan-execute-synthesize pipeline, processes, screenshots, security scanning, semantic search, sysinfo, task management, text extraction, tool definitions, tool dispatch

model context protocol

github.com 5 days ago

1198. HN Show HN: Multicorn Shield – Open-source permissions and approvals for AI agents

Multicorn Shield is an open-source tool designed to enhance the security and manageability of AI agents interacting with sensitive data by providing comprehensive permissions, oversight, and control mechanisms. The tool features a unified Software Development Kit (SDK) that enforces agent actions within predefined boundaries through permissions enforcement, logs all activities for real-time tracking, allows users to manage consent via approval screens, and implements precise spending controls to prevent errors due to floating-point arithmetic. The tool offers three main integration methods: Proxy Integration, which requires no code changes; Native Plugin Integration specific to OpenClaw that intercepts calls at an infrastructure level; and SDK Direct Integration for complete customization of user consent interfaces, spending limits, and activity logging. Technically, Multicorn Shield supports both browser environments and Node.js and relies on a hosted backend API for data persistence and policy enforcement. It includes components such as the Consent Screen web component, scope validation logic, action logging functionality, spending checks, and an MCP adapter for middleware integration. Examples provided in its documentation illustrate how developers can integrate Multicorn Shield into applications using various frameworks like React, Vue, Svelte, and Vanilla HTML. As an open-source project under the MIT license, it invites contributions via GitHub and outlines development guidelines in a CONTRIBUTING.md file. Operating as part of the larger Multicorn ecosystem, Multicorn Shield functions as a client-side SDK that communicates with the Multicorn Service API for backend operations, ensuring no local storage of credentials while maintaining a detailed audit trail. Keywords: #phi4, AI, API key, MCP server, Multicorn, Nodejs, OpenClaw, React, SDK, Shield, Svelte, TypeScript, Vanilla HTML, Vue, action logging, agents, approvals, audit trail, consent screens, integration, middleware adapter, npm, permissions, plugin, proxy, scopes, spending controls

model context protocol

github.com 5 days ago
https://multicorn.ai/shield 5 days ago

1201. HN GZOO Cortex – local-first knowledge graph that watches your project files

GZOO Cortex is a local-first knowledge graph tool designed specifically for developers managing multiple projects. It leverages large language models (LLMs) to automatically monitor project files—including markdown, TypeScript, and JSON—extracting entities such as decisions, components, and dependencies. The system maps the relationships among these entities across various projects, identifies contradictions in decision-making processes, and facilitates natural language queries of the knowledge graph. Cortex supports both local and cloud-based LLMs through providers like Anthropic, Google Gemini, and Ollama, allowing users to tailor query routing based on privacy needs and resource limitations, from cloud-first to completely local operations. The tool features a web dashboard for real-time visualization of the knowledge graph, enabling developers to explore data dynamically. It includes functionalities such as contradiction resolution and integrates with Claude Code through an MCP server. Setup involves installation and initialization commands where users specify directories to monitor and set desired privacy levels. Data is stored locally in SQLite databases to protect sensitive information from cloud exposure. Cortex utilizes tree-sitter for parsing and D3.js for visualization. Overall, GZOO Cortex aims to assist developers in maintaining project context by consolidating decisions and patterns into a readily accessible knowledge base. Keywords: #phi4, Anthropic, Chokidar, Claude Code, D3, GZOO Cortex, Google Gemini, LLMs, LanceDB, MCP server, Ollama, React, SQLite, configuration, developers, entities, file watching, knowledge graph, local-first, natural language queries, privacy, project files, relationships, security, tree-sitter, web dashboard

model context protocol

github.com 5 days ago

1232. HN ChatGPT for Excel and new financial data integrations

OpenAI has introduced a beta version of ChatGPT for Excel, an add-in that enhances spreadsheet management by incorporating AI capabilities directly into Excel workbooks. Utilizing GPT-5.4 (dubbed GPT-5.4 Thinking), this tool aids in financial modeling, scenario analysis, and data extraction tasks, thereby streamlining the workflow within Excel environments. It integrates with platforms such as FactSet and Dow Jones Factiva to alleviate manual effort, facilitating more efficient handling of financial workflows. The add-in empowers users to articulate their needs using natural language to create or modify spreadsheet models without disrupting existing formulas and structures, even across expansive datasets. This functionality allows for tracing assumptions and validating outputs while maintaining calculations native to Excel. Despite occasional need for refinement in responses, continuous enhancements are being made based on user feedback. In addition to enhancing Excel functionalities, OpenAI has expanded financial data integrations within ChatGPT to simplify access to market and company information, benefiting tasks like due diligence and research by producing cited outputs such as earnings summaries and valuation reports. For enterprise use, ChatGPT Enterprise provides comprehensive security features including role-based access control, SAML SSO, encryption, and regional processing controls, ensuring its safe application in regulated industries. Financial institutions have noted substantial workflow improvements, with accelerated research and due diligence processes allowing professionals to concentrate on more strategic aspects of their roles. OpenAI's ongoing collaboration with financial organizations aims to fine-tune these offerings while promoting responsible AI adoption within highly regulated sectors. Keywords: #phi4, AES-256, AI deployments, API, ChatGPT, Daloopa, Dow Jones Factiva, Excel, FactSet, GPT-54, LSEG, RBAC, S&P Global, SAML, SCIM, TLS, add-in, analysis, automation, beta, due diligence, enterprise, finance, financial data, financial institutions, governance, integrations, market data, modeling, research, scenarios, security, spreadsheets, workflows

model context protocol

openai.com 5 days ago

1255. HN Show HN: Cruxible Core – Deterministic decision engine with receipts for agents

Cruxible Core is an open-source decision engine designed for deterministic execution, enhancing the capabilities of AI agents like Codex and Claude Code by providing a system that ensures auditable and reproducible decisions. Users define decision-making parameters through YAML files, which specify entities, relationships, queries, and constraints within various domains. The system processes these queries on a knowledge graph, outputting Directed Acyclic Graph (DAG) receipts that transparently trace the derivation of results, thus offering clarity in decision-making. The engine is structured to deliver consistent outcomes irrespective of prompt variations, making it ideal for environments where reliable decisions are critical. It features receipt-based provenance and constraint systems for validation rules alongside candidate detection strategies. These functions operate without reliance on Large Language Models (LLMs) or API keys during execution, utilizing tools such as Pydantic, NetworkX, and SQLite to maintain efficiency and independence. Demonstrations of Cruxible Core span various sectors including healthcare, fintech/regtech, and cybersecurity, showcasing its versatility in handling complex decision-making tasks like drug interaction analysis, OFAC sanctions screening, and threat modeling. Although it currently faces challenges with edge generation and lacks an action layer for direct application use, future updates are anticipated to address these issues. Cruxible Core supports a comprehensive lifecycle through the Model Context Protocol (MCP), facilitating AI agent orchestration via command-line interfaces and server configurations. The project encourages user feedback and contributions on its GitHub platform under an MIT license, aiming to expand its capabilities across diverse domains with ongoing enhancements. Keywords: #phi4, AI agents, Cruxible Core, DAG receipt, FastMCP, MCP server, NetworkX, Polars, Pydantic, SQLite, YAML, agents, audit trail, candidate detection, constraints, deterministic decision engine, feedback loop, knowledge graph, receipts

model context protocol

github.com 6 days ago

1262. HN Show HN: Blinkit MCP – Let Claude order groceries

Blinkit MCP, an experimental Model Context Protocol server, automates grocery shopping on Blinkit using Claude Desktop by leveraging natural language processing and browser automation through Playwright, bypassing traditional API usage. The system empowers users to perform tasks like product searching, cart management, location input for deliveries, and checkout processes, including secure login via phone verification and UPI payments. Key features of the MCP include intelligent search functionality, secure authentication mechanisms, robust cart and delivery management capabilities, and streamlined payment automation that culminates in a seamless checkout experience. The installation process is user-friendly, supporting macOS, Windows, and Linux platforms, with options to run directly within Claude Desktop or from source following manual setup instructions. This project exemplifies the potential of large language models (LLMs) for browser control without relying on conventional APIs and serves as a proof-of-concept tool that raises questions about future automation methodologies. Importantly, Blinkit MCP is distinct from Blinkit India Private Limited and is available under the MIT License. Keywords: #phi4, Blinkit MCP, Claude Desktop, Model Context Protocol, OTP login, Playwright automation, UPI payments, browser session, checkout flow, experimental proof of concept, grocery shopping, natural language, secure authentication, service APIs

model context protocol

github.com 6 days ago

1278. HN No Cloud, No Waiting: Tool-Calling Agents on Consumer Hardware with LFM2-24B-A2B

LFM2-24B-A2B is a local AI tool optimized for consumer hardware, enabling efficient operation without cloud dependency while prioritizing data privacy by keeping processes on-device. The evaluation involved using LocalCowork, an agent running on an Apple M4 Max laptop with 36 GB unified memory, to demonstrate its capabilities in workflows such as security scanning, document processing, and system information retrieval—all executed sub-second without internet access. LFM2-24B-A2B showed high accuracy in single-step tool selections within structured domains but faced challenges in handling multi-step chains. Although it is a strong candidate for privacy-sensitive applications on consumer devices due to its effective tool dispatching capabilities, there are opportunities for enhancement through targeted post-training. Ongoing pre-training efforts aim to improve its functionality further, with future versions like LFM2.5-24B-A2B expected to offer more refined features. The LocalCowork example underscores the potential of local agents in delivering efficient and private AI solutions directly on user hardware, emphasizing their value in applications where data privacy is critical. Keywords: #phi4, Audit Trails, Consumer Hardware, Desktop App, Document Processing, LFM2-24B-A2B, Latency, Local AI, LocalCowork, Memory Efficiency, Model Dispatch, Multi-step Chains, On-device Agent, Post-training, Privacy, Reinforcement Learning, Security Scanning, Structured Domains, Tool-Calling Agents

model context protocol

www.liquid.ai 6 days ago

1315. HN AI Agent Authentication and Authorization IETF RFC Draft

The IETF draft "AI Agent Authentication and Authorization" proposes a framework for securely authenticating and authorizing AI agents, ensuring they can access resources and perform actions with robust security measures in place. It leverages existing standards like the Workload Identity in Multi-System Environments (WIMSE) architecture and OAuth 2.0 to define protocols for verifying AI agent identities and managing permissions, enhancing trustworthiness across systems. The document conceptualizes AI agents as workloads interacting with Large Language Models (LLMs), introducing an Agent Identity Management System (AIMS). AIMS encompasses components such as unique identifiers, cryptographic credentials, attestation mechanisms, provisioning processes, authentication protocols, authorization frameworks, monitoring strategies, observability measures, remediation actions, policy configurations, and compliance adherence. Agent Identifiers involve using standards like WIMSE or SPIFFE for uniqueness. Agent Credentials focus on short-lived, dynamically provisioned cryptographic bindings to bolster security. Authentication is achieved through transport-layer methods (e.g., mTLS) and application-layer mechanisms (e.g., WIMSE Proof Tokens). The Authorization Framework employs OAuth 2.0 for limited access, supporting diverse grant flows tailored to specific scenarios. The draft underscores the importance of minimizing risks via short-lived credentials and vigilant monitoring of agent activities to ensure compliance and maintain observability. Additionally, it addresses cross-domain access and privacy in token usage, aiming to enhance interoperability without defining new protocols. Ultimately, this model seeks to utilize existing standards while identifying future areas for AI agent-specific standardization efforts. Keywords: #phi4, AI Agent, Access Token, Attestation, Authentication, Authorization, Cross Domain, Delegation, Framework, Identity Management, Interoperability, JWT, Monitoring Observability, OAuth 20, Policy, Privacy Considerations, SPIFFE, Security, Standards, TLS, Transaction Tokens, WIMSE

model context protocol

datatracker.ietf.org 6 days ago

1370. HN Show HN: Arbor – a CLI that shows what breaks before you refactor

Arbor is an advanced command-line interface (CLI) tool designed to predict potential issues in codebases prior to refactoring by employing a graph-based approach for impact analysis. As of March 2026, Arbor is gearing up for its v1.6 release while maintaining version 1.5 as the stable line. The tool is notable for its accurate token counting using `tiktoken (cl100k_base)` and offers typo-tolerant fuzzy symbol suggestions through Jaro-Winkler matching. Enhanced AI integration provides detailed JSON outputs with confidence levels, aiding in decision-making processes during code modification. Arbor is particularly adept at Git-aware workflows, allowing users to assess refactoring risks via commands like `arbor diff`, `arbor check`, and `arbor open`. Incremental refresh capabilities and improvements in Python user experience further streamline its functionality. Arbor functions as a local-first impact analysis engine that translates code into semantic dependency graphs. This enables precise tracing of execution paths, including callers, callees, imports, and cross-file dependencies, offering deterministic insights about the implications of code alterations. Additionally, Arbor features a native graphical interface for interactive impact analysis, providing symbol search, visualization of impacts, privacy-safe interactions, and export options. The tool supports both CLI and GUI modes to ensure consistency across functionalities. Installation is straightforward with cargo or one-command installers available for various operating systems. Users can perform impact analysis by setting up Arbor within their project directories and using commands such as `arbor refactor <symbol-name>`. In terms of development, the main trunk is dedicated to ongoing enhancements while release branches maintain stability with fixes and feature integrations. Arbor integrates seamlessly with the Model Context Protocol (MCP) for AI queries and supports a wide array of programming languages including Rust, TypeScript, JavaScript, Python, Go, Java, C/C++, C#, and Dart. This cross-file resolution capability underscores its versatility. Security is ensured through local-only operation without data exfiltration or API key requirements, while Arbor remains open source under the MIT License. As a comprehensive tool for developers, Arbor enhances confidence and safety in refactoring processes by providing a thorough understanding of codebase impacts before any changes are made. Keywords: #phi4, Arbor, CLI, GUI, Git workflows, MCP, Python, Rust, TypeScript, codebases, confidence scoring, execution paths, impact analysis, local-first, security model, semantic dependency graph

model context protocol

github.com 6 days ago
https://github.com/Anandb71/arbor 6 days ago

1393. HN Can AI agents build real Stripe integrations? We built a benchmark to find out

The article examines the potential of AI agents in autonomously constructing full-fledged Stripe integrations by creating a benchmark specifically designed for testing large language models (LLMs). While these models show proficiency in limited coding tasks, they encounter difficulties when handling comprehensive software engineering projects that require managing persistent states and failure recovery. The research team developed various environments to simulate realistic Stripe integration challenges, including backend-only setups, full-stack integrations, and specific feature exercises. The study found notable successes among certain models: Claude Opus 4.5 effectively handled full-stack API integrations, while OpenAI’s GPT-5.2 performed well on specialized "gym" problems that involved intricate configurations. Nevertheless, AI agents still face difficulties with ambiguous tasks or those requiring detailed browser interactions, where they sometimes become stuck or make incorrect assumptions. The research underscores the critical role of benchmarks in refining AI tools' performance by highlighting existing gaps and testing new solutions. This approach is vital for enhancing the precision and thoroughness required for complex business integrations like Stripe. Moving forward, the team aims to broaden these evaluations to include a wider range of integration scenarios and promote community collaboration to further improve agentic software engineering capabilities. Keywords: #phi4, AI agents, API, LLMs, SDK upgrades, Stripe integrations, backend, benchmark, browser use, documentation bugs, evaluation challenges, frontend, iterative loop, software engineering

model context protocol

stripe.com 6 days ago

1447. HN Googleworkspace/CLI

Google Workspace CLI, abbreviated as `gws`, provides a unified command-line interface for managing various Google Workspace services including Drive, Gmail, and Calendar. By leveraging Google's Discovery Service, the tool dynamically generates commands that automatically update with new API additions, streamlining management tasks without requiring complex curl requests against REST documentation. It offers features such as tab-completion, structured JSON outputs, and supports over 100 agent skills for AI integration, allowing users to interact with Google Workspace APIs efficiently without custom development. Installation is simple using npm: `npm install -g @googleworkspace/cli`, supporting multiple authentication workflows suitable for local, CI, or server-to-server contexts, including interactive OAuth, manual setup, browser-assisted flows, service accounts, and pre-obtained access tokens. The tool enhances AI capabilities by allowing individual or bulk installation of agent skills. Additionally, it integrates with Gemini via an extension, enabling direct command usage within the Gemini environment and supports starting a Model Context Protocol server to expose Google Workspace tools for MCP-compatible clients like Claude Desktop or VS Code. Developers can contribute by building and testing with Cargo tools and resolving issues such as disabled APIs through specific error messages that guide users to make adjustments in the GCP Console. Although still under active development and subject to potential breaking changes before its v1.0 release, `gws` is distributed under the Apache-2.0 license. Keywords: #phi4, AI agents, API, CLI, Calendar, Chat, Drive, Gmail, Google Cloud, Google Workspace, JSON, MCP Server, Model Armor, OAuth, OpenClaw, Sheets, agent skills, coverage report, discovery service, environment variables, linting, multipart uploads, pagination, service account, structured output

model context protocol

  github.com 6 days ago
   https://github.com/jpoehnelt   6 days ago
   https://justin.poehnelt.com   6 days ago
   https://github.com/googlers   6 days ago
   https://justin.poehnelt.com/posts/rewrite-your-cli-for-   6 days ago
   https://workspaceupdates.googleblog.com/2025/12/wo   6 days ago
   https://github.com/GAM-team/GAM   6 days ago
   https://github.com/steipete/gogcli   6 days ago
   https://cloud.google.com/sdk/docs/install   6 days ago
   https://docs.cloud.google.com/sdk/docs/install-sdk   6 days ago
   https://xkcd.com/1987/   6 days ago
   https://github.com/googleworkspace   6 days ago
   https://github.com/enterprises/alphabet   6 days ago
   https://news.ycombinator.com/item?id=47252459   6 days ago
   https://news.ycombinator.com/item?id=26998308   6 days ago
   https://github.com/googleanalytics/google-analytics-mcp   6 days ago
   https://github.com/benkaiser/joey-mcp-client   6 days ago
   https://gmail.mintmcp.com/   6 days ago
   https://gcal.mintmcp.com/   6 days ago
   https://gdocs.mintmcp.com/   6 days ago
   https://gsheets.mintmcp.com/   6 days ago
   https://news.ycombinator.com/item?id=47208398   6 days ago
   https://news.ycombinator.com/item?id=47157398   6 days ago
   https://learn.microsoft.com/en-us/powershell/micro   6 days ago
   https://github.com/think41/extrasuite   6 days ago
   https://pchalasani.github.io/claude-code-tools/integrat   6 days ago
   https://github.com/google   6 days ago
   https://www.supyagent.com   6 days ago
   https://github.com/googleworkspace/cli/releases   6 days ago
   https://axodotdev.github.io/cargo-dist/   6 days ago
   https://xcancel.com/github/status/2029277638934839   6 days ago
   https://workspace.google.com/   6 days ago
   https://github.com/googleworkspace/cli/issues/   6 days ago
   https://venn.ai   6 days ago
   https://roy.gbiv.com/untangled/2008/rest-apis-must   6 days ago

1454. HN Show HN: Residuum | Agentic AI with continuous context

Residuum is an advanced AI agent framework engineered to maintain continuous context across sessions, overcoming limitations inherent in existing systems such as OpenClaw, NanoClaw, and RAG-based agents. By utilizing a persistent memory system that logs all conversations and interactions through "Observational Memory," Residuum seamlessly integrates experiences from various channels like CLI and Discord without session boundaries. This approach eliminates the need for retrieval of recent history, thus enhancing continuity and minimizing latency. Key features of Residuum include structured pulse scheduling using YAML files to manage proactive checks efficiently while avoiding superfluous computations. The system also supports sub-agent tasks that distribute work based on model tiering, facilitating optimal performance across diverse applications. It offers multi-channel support with compatibility for OpenClaw skills, and its implementation in Rust ensures high performance and a file-first approach where state information is stored in human-readable files. Residuum's architecture is designed to be both extensible and modular, enabling independent operation of system components such as Memory, Projects, Pulses, and Skills through shared data rather than tight coupling. The framework accommodates failover among several large language model (LLM) providers including Anthropic, OpenAI, Google, and Ollama, enhancing its robustness. Residuum is open for contributions under the MIT license, with comprehensive documentation provided to guide setup and development processes. Keywords: #phi4, API Keys, Agentic AI, Anthropic Claude, Continuous Context, File-first Design, GPT-4o, Gemini, LLM, MIT License, Multi-Channel Gateway, Observational Memory, Ollama, OpenClaw, Pre-commit Hooks, Proactivity, Provider Failover, Pulse Scheduling, Residuum, Rust, YAML

model context protocol

github.com 6 days ago

1458. HN Show HN: Kvlar – Open-source firewall for AI agent tool calls

Kvlar is an open-source security framework designed as a policy engine that acts as a protective layer between AI agents and their associated tools, such as Model Context Protocol (MCP) servers. It addresses the problem of unsecured operations by AI agents—such as database queries, code pushes, Slack messages, and shell commands—that lack inherent security boundaries or comprehensive governance structures like persistent rules, automation, and auditing capabilities. Kvlar operates as a stdio proxy, allowing users to define YAML-based policies that govern tool interactions, thereby ensuring only permitted actions are executed by AI agents. The system incorporates several features to enhance security management: it covers various tools such as Postgres for blocking harmful commands, GitHub for managing repository changes, Slack for controlling messaging, and Shell for preventing dangerous operations. Policies can be composed using a template-based approach similar to Docker Compose, enabling scalability and customization of rules. Kvlar is compatible with platforms like Claude Desktop and MCP servers, written in Rust without I/O operations in its core logic. The technical framework includes four distinct crates: `kvlar-core` for policy evaluation, `kvlar-proxy` functioning as the security proxy, and `kvlar-audit` for logging activities. It provides a comprehensive suite of over 100 policy tests, supports extending policies through composition, and offers CLI commands to facilitate operations such as initializing policies, wrapping/unwrapping MCP clients, testing, validating actions, inspecting policies, exporting JSON schema, and starting the security proxy. To implement Kvlar, users must clone its repository and build it using Cargo. The process involves initializing a policy with provided templates, injecting Kvlar into MCP client configurations, writing tests to verify policy behavior, and restoring original commands when necessary by unwrapping. Developed for compatibility with MCP version 2024-11-05 and supporting both stdio and TCP transport, Kvlar is also designed to integrate seamlessly with Claude Desktop tools. Licensed under Apache 2.0, more information about Kvlar can be accessed on its official website. Keywords: #phi4, AI agents, Apache 20, CLI tool, Claude Desktop, GitHub, JSON-RPC, Kvlar, MCP servers, Model Context Protocol (MCP), Postgres, Rust, Shell commands, TCP, YAML security policies, audit logging, deterministic, firewall, open-source, policy engine, proxy, stdio

model context protocol

github.com 6 days ago

1484. HN Show HN: Composable middleware for LLM inference Optimization Passes

AutoAgents is a modular multi-agent framework crafted in Rust, designed to build intelligent systems emphasizing performance, safety, and composability. It integrates type-safe agent models with structured tooling and offers configurable memory alongside pluggable Large Language Model (LLM) backends suitable for both cloud and local inference environments. Key features include implementing ReAct patterns, streaming responses, and utilizing derive macros for tools and outputs within a sandboxed WebAssembly (WASM) runtime for secure execution. The framework supports sliding window memory with customizable backends and accommodates LLM providers such as OpenAI and Anthropic in the cloud, as well as local models like LlamaCpp, through a unified interface. AutoAgents employs a Tower-style middleware stack to manage Large Language Model inference, ensuring consistent application of safety features like caching and data sanitization across all paths without necessitating separate services or ad-hoc code. This architecture enhances both efficiency and security within the framework. Additionally, it focuses on observability and performance through OpenTelemetry tracing and metrics with customizable exporters, leveraging full async/await support and horizontal scaling capabilities for optimized memory usage. The project is open-source, dual-licensed under MIT and Apache 2.0, inviting community contributions and providing extensive API documentation and examples to assist developers in utilizing its features effectively. AutoAgents aims to establish a solid foundation for edge AI deployments by enhancing safety, reliability, and performance through its innovative middleware architecture and Rust-based design. Keywords: #phi4, AutoAgents, LLM, OpenTelemetry, PII, Qdrant, ReAct, Rust, WASM runtime, agents, async/await, benchmarks, caching, executor, framework, guardrails, inference, memory, middleware, multi-agent, observability, optimization, orchestration, performance, pipeline, procedural macros, providers, safety, scalability, telemetry, tools, vector store

model context protocol

github.com 6 days ago

1501. HN Show HN: Kryfto – Self-hosted MCP server with 42 tools for AI agent web access

Kryfto is an open-source, self-hosted browser data collection platform designed for AI agents to access web content using headless browsers. It features a Model Context Protocol (MCP) server with over 42 tools that facilitate integration with AI systems like Claude, Cursor, and Codex for functions such as search, extraction, and research. The core functionality includes the Stealth Engine, which employs anti-bot measures like user-agent rotation to mimic organic traffic; privacy assurance through in-memory HTTP extractions without data persistence; and seamless compatibility with workflow engines including n8n and Zapier via a documented OpenAPI specification. Kryfto supports robust infrastructure using Postgres for data persistence, Redis + BullMQ for job queuing, and MinIO/S3 for storage. Deployment can be done locally with Docker Compose, offering quick setup and secure configuration management for extraction jobs. The platform provides extensive documentation covering all components and integration guidelines for various AI applications and workflow tools. Use cases of Kryfto range from market research, such as competitor pricing tracking using CSS selectors, to technical research that offers trust score rankings, AI coding assistance with up-to-date documentation, lead generation by automating contact extraction into CRM systems, and evaluating risks in software framework upgrades. It includes configurable options for stealth and anti-bot measures to bypass site protections. Kryfto's architecture is an NPM monorepo utilizing pnpm workspaces, dividing applications between a control plane and worker processes managing Playwright instances. Open-sourced under the Apache-2.0 license, Kryfto encourages user support through donations and focuses on reducing reliance on third-party scraping APIs by offering a flexible, privacy-focused solution that efficiently handles concurrent browser tasks without external API dependencies. Keywords: #phi4, AI agents, AI-context optimization, Anthropic Model Context Protocol Bridge, BullMQ workers, Docker Compose, Fastify control plane, Kryfto, MCP server, MinIO/S3, Model Context Protocol, OpenAPI, Playwright instances, Postgres, Redis, SLO dashboard, SLO monitoring, TypeScript SDK, anti-bot layer, concurrency limits, continuous research agent, cost savings, data extraction, data privacy, documentation monitoring, enterprise infrastructure, federated search, headless browser, lead generation, market research, n8n integration, price monitoring, privacy, risk assessment, scraping tools, self-hosted, stealth configuration, stealth engine, technical research, web crawling, workflow automation

model context protocol

github.com 7 days ago

1504. HN You Need to Rewrite Your CLI for AI Agents

The article discusses redesigning Command-Line Interfaces (CLIs) with a focus on accommodating both human users and artificial intelligence (AI) agents, introducing concepts such as Human Developer Experience (Human DX) and Agent Developer Experience (Agent DX). While Human DX emphasizes ease of use through discoverability and user forgiveness, Agent DX demands predictability and robustness. The article suggests that traditional CLIs should adapt to meet the needs of both humans and AI by ensuring deterministic, machine-readable outputs without diminishing existing human-centric functionalities. Key recommendations for developing such adaptive CLIs include replacing bespoke flags with raw JSON payloads for clearer data handling and employing schema introspection instead of static documentation, enabling agents to query API capabilities dynamically. The article also stresses enhancing input validation to manage potential errors from AI interactions by using field masks, URL encoding, and dry-run options. To support both humans and AI effectively, CLIs should offer multiple interfaces such as Model Context Protocol (MCP) for JSON-RPC tools, Gemini extensions, and environment variables for authentication. Safety measures like local request validation through dry-runs and response sanitization with tools like Google Cloud Model Armor are advised to prevent data misuse. For existing CLI systems, the article recommends incremental upgrades starting with machine-readable outputs and input validation, followed by schema introspection, skill files, field masks, dry-run capabilities, and appropriate context documentation. The overarching message is that while CLIs need not be completely overhauled, they should evolve progressively to efficiently address the unique demands of AI agents without compromising human usability. Keywords: #phi4, AI Agents, API Documentation, Agent DX, CLI, Context Window, Defense-in-Depth, Discoverability, Dry-Run, Environment Variables, Field Masks, Google Workspace CLI, Human DX, Input Hardening, JSON Payloads, MCP, Model Context Protocol, NDJSON, OAuth, Predictability, Response Sanitization, Safety Rails, Schema Introspection

model context protocol

  justin.poehnelt.com 7 days ago
   https://news.ycombinator.com/item?id=47255881   6 days ago
   https://en.wikipedia.org/wiki/SOAP   6 days ago
   https://varlink.org/   6 days ago
   https://github.com/coast-guard/coasts   6 days ago

1518. HN My MCP Server Setup: A Practical Guide to Wiring AI into Everything

This guide details the configuration of Model Context Protocol (MCP) servers integrated with Claude Code on a RHEL 10 workstation, enabling AI assistants to access external tools like Jira and WordPress via more than 25 MCP servers, including custom "CrunchTools" by the author and open-source ones from other projects. The architecture utilizes rootless Podman containers managed by systemd user services, allowing for non-root server startup on login while assigning fixed localhost ports for secure HTTP communication. A standout feature is the "Memory" MCP server, which maintains persistent semantic memory across sessions to improve workflow efficiency. Custom skills in markdown files allow chaining multiple servers into workflows tailored for tasks such as drafting blog posts or managing Jira comments. The guide highlights the significance of a configuration file (CLAUDE.md) for aligning Claude Code's behavior with RHEL development standards, crucial for effective session management. It advises beginning with setting up CLAUDE.md and the Memory MCP server before expanding based on specific work needs through containerization and systemd user services. Overall, this MCP server architecture turns the terminal into a potent interface for efficiently and securely managing digital infrastructure, leveraging AI to quickly establish new workflows. Keywords: #phi4, AI Integration, Architecture, Claude Code, Containers, Data Sources, External Tools, MCP Server, Open Source, Persistent Memory, Protocol, Security Standards, Systemd Services, Workflow Automation

model context protocol

crunchtools.com 7 days ago

1556. HN Investors spill what they aren't looking for anymore in AI SaaS companies

Investors have redirected their attention from generic AI SaaS tools toward startups that integrate artificial intelligence more profoundly into essential business processes. The focus is now on AI-native infrastructure, vertical-specific software solutions powered by proprietary data, and systems woven into mission-critical operations. Startups providing superficial workflow enhancements or basic analytics are increasingly seen as less appealing due to the ease with which their offerings can be replicated by teams specializing in AI from inception. In contrast, companies that demonstrate actual control over workflows, offer rapid adaptability, and present flexible pricing models—moving away from traditional per-seat structures—are gaining favor. The competitive edge of relying on integration is waning as innovations like Anthropic's MCP emerge, lessening its strategic value. To attract investment, businesses are encouraged to embed AI deeply into their products and emphasize this in marketing strategies. Consequently, investors are channeling funds toward companies that possess proprietary data, genuine workflow ownership, and specific domain expertise, steering clear of easily replicable solutions. Keywords: #phi4, AI SaaS, AI-native infrastructure, MCP, consumption-based models, domain expertise, domain expertise Keywords: AI SaaS, investors, model context protocol (MCP), product depth, proprietary data, startups, systems of action, task management tools, vertical SaaS, workflow ownership, workflow stickiness

model context protocol

techcrunch.com 7 days ago

1593. HN Show HN: DNS-based MCP registry discovery – live demo at mcp.mariothomas.com

The text describes a DNS-based Model Context Protocol (MCP) registry discovery solution designed to streamline AI agent tool discovery within MCP ecosystems. Organizations can publish a simple DNS TXT record at `_mcp.yourdomain.com` to facilitate seamless tool discovery for compliant AI agents, eliminating the need for new protocols or infrastructure. The system allows agents to discover tools via standard calls like `tools/list` and `tools/call`. A key feature is its DNS-based bootstrap layer, which enables agents to locate all tools in an organization's MCP ecosystem using a single DNS TXT record, similar to protocols such as `_dmarc`. Registry accessibility can be managed publicly or privately; public access is controlled by a boolean flag in the DNS record, while private registries require authentication. Changes to registry entries are governed through Git pull requests, ensuring transparency and accountability. The architecture employs AWS components like CloudFront, Lambda@Edge, DynamoDB, and S3 but remains vendor-neutral, with plans for implementation using alternative cloud services. Deployment involves setting up a DNS record, deploying the necessary infrastructure on a chosen provider, populating the registry in DynamoDB, and conducting tests using provided client examples. This solution aims to simplify agent discovery processes by reducing configuration overhead and enhancing governance compared to traditional methods. The project encourages contributions, especially for developing alternative implementations and feedback on the DNS convention. It is licensed under MIT, with additional details available in the repository documentation. Keywords: #phi4, AI agents, AWS, CloudFront, DNS, DynamoDB, Git pull requests, Lambda@Edge, MCP, TXT records, architecture, authentication, discovery, registry

model context protocol

github.com 7 days ago

1602. HN The Agentic Data Stack open-source, composable architecture for analytics

The Agentic Data Stack is an open-source architecture that streamlines the integration of AI agents with data sources, bypassing traditional analytics workflows by enabling users to interact with data via natural language through a user-friendly interface called LibreChat. Comprising three main components—ClickHouse for efficient analytical database queries, MCP servers (such as ClickHouse MCP) that connect Large Language Models (LLMs) to databases, and Langfuse for managing AI interactions—the stack is designed for flexibility and real-time functionality. It emphasizes data sovereignty by keeping all operations local and offers model choice flexibility, allowing integration with various AI providers or self-hosted models. Key features of the Agentic Data Stack include support for real-time querying, visualization generation, and continuous quality monitoring without requiring SQL knowledge, making it accessible to a broad range of users. Its adoption by companies such as Shopify, Canva, cBioPortal, Khan Academy, Daimler Truck, SumUp, and ClickHouse underscores its effectiveness in enhancing data interaction capabilities. Users can quickly set up the Agentic Data Stack locally using Docker with a straightforward script that handles necessary configurations, allowing immediate access to tools like LibreChat and Langfuse for AI-driven data analysis and insights exploration. Keywords: #phi4, AI agents, Agentic Data Stack, ClickHouse, Docker, LLMs, Langfuse, LibreChat, MCP server, Model Context Protocol (MCP), analytics, data sovereignty, observability, open-source

model context protocol

clickhouse.com 7 days ago

1618. HN Show HN: ClawSandbox – 7/9 attacks succeeded against an AI agent w/ shell access

ClawSandbox is a sophisticated security testing framework aimed at evaluating vulnerabilities within AI agents capable of executing shell commands and interfacing with system resources. It identifies various attack classes that affect these agents, including prompt injection, memory poisoning, privilege escalation, container escapes, data exfiltration, tool abuse, supply chain attacks, session hijacking, SSRF (Server-Side Request Forgery), and remote code execution. The OpenClaw case study reveals critical findings: prompt injection tests uncovered vulnerabilities in the model itself rather than its framework, with three successful breaches leading to malicious command execution or data access. Memory poisoning was prevalent across tested AI agents, allowing silent behavioral changes through undetected memory writes. The test environment demonstrated robust container security measures that effectively prevented escapes. Code audits identified severe patterns potentially enabling arbitrary code execution via functions like `eval()` and `child_process`. ClawSandbox encompasses 11 OWASP-aligned security categories, with six currently implemented; five are pending community contributions. It includes comprehensive instructions for vulnerability testing using a Docker-based isolated container environment. The framework's importance lies in its ability to test AI agents' security postures by identifying common vulnerability patterns across various systems capable of executing code. Usage guidelines suggest cloning the repository, building the Docker container, and running customized tests to target specific vulnerabilities—results are temporary and require manual saving for persistence. ClawSandbox is intended strictly for authorized testing and educational purposes, emphasizing responsible vulnerability disclosure. It serves as an essential tool for developers, researchers, and security professionals aiming to safeguard AI agents from potential exploits. Keywords: #phi4, AI agents, API calls, LLM-based agents, OpenClaw, code audit, container security, data exfiltration, memory poisoning, privilege escalation, prompt injection, sandbox, threat model

model context protocol

github.com 7 days ago

1703. HN Background Coding Agents: Predictable Results Through Strong Feedback Loops

Spotify is advancing the development of their background coding agents, internally referred to as "Honk," aimed at automating software maintenance for numerous components. The focus in this phase is on enabling these agents to autonomously produce accurate and reliable outcomes without human oversight by reducing potential failure modes such as unsuccessful pull requests (PRs), continuous integration (CI) failures, or incorrect PRs from a functional standpoint. To ensure predictability and reliability, Spotify has established robust verification loops. These involve independent verifiers that provide incremental feedback based on the content of software components, thereby ensuring code correctness without requiring agents to manage complex tasks like parsing test outputs. Additionally, a Large Language Model (LLM) serves as an evaluator for proposed changes against initial prompts, maintaining the agent's focus and adherence to its designated scope. Despite operating with limited access due to security considerations, the background coding agent is supported by external infrastructure that facilitates more intricate operations. Looking ahead, Spotify intends to broaden verifier support across diverse hardware platforms and operating systems, integrate these agents into continuous integration/continuous deployment (CI/CD) pipelines for enhanced validation, and conduct structured evaluations to systematically refine agent performance. This comprehensive approach aims to achieve dependable large-scale code transformations using background coding agents. Keywords: #phi4, Agents, Automation, Background Coding, CI/CD Pipelines, Code Transformation, Continuous Integration, Feedback Loops, Fleet Management, Infrastructure, Judge, LLMs (Large Language Models), PR (Pull Request), Predictable Results, Reliability, Sandbox, Security, Software Maintenance, Spotify, Test Coverage, Verification Loops, Verifiers

model context protocol

engineering.atspotify.com 7 days ago

1708. HN Pincer – Python AI agent framework, security-first

Pincer is an innovative, open-source Python framework designed for developing secure, self-hosted AI agents that operate across popular messaging platforms such as WhatsApp, Telegram, Discord, Slack, and email systems. The framework emphasizes security through features like allowlists, tool approval prompts, AST scanning, and sandboxing of skills to prevent malicious activities. It supports auditability and user control with a concise codebase and limited environment variables, alongside mechanisms like daily API call spending caps for cost management. Pincer's ease of use is highlighted by its flexible installation options through pip, Docker, or one-click cloud setups, requiring only Python 3.11+, an LLM API key, and a Telegram bot token as prerequisites. Developed out of necessity due to security concerns with existing AI agents and potential cost issues, Pincer aims to provide a transparent and secure alternative for users handling sensitive data. The framework contrasts with others like OpenClaw by prioritizing auditability, cost control, and sandboxed security over an extensive plugin ecosystem. It supports various channels and tools such as email checking, calendar management, web searching, and shell command execution, all requiring user approval before use. Its extensible skill system allows for the dynamic loading of custom skills, with a focus on preemptive security scanning. While Pincer effectively guards against unauthorized access, malicious skills, and cost overruns, it acknowledges potential vulnerabilities from compromised hosts or untrustworthy LLM providers. The project is maintained by an individual developer who seeks to expand the contributor community and explore managed hosting for financial sustainability. Looking forward, Pincer plans to enhance its features through community contributions, including encrypted memory, multi-agent routing, and more channel support, all under an MIT license that promotes open collaboration with a strong emphasis on security and user autonomy. Keywords: #phi4, AI agent, Docker, Pincer, Python, SQLite, Twilio, audit log, messaging apps, open-source, sandboxing, security-first, skills, subprocesses

model context protocol

github.com 7 days ago
https://pincer.sh/docs 7 days ago

1715. HN Open-source community gets a Claude-sized gift

Anthropic has launched the "Claude for Open Source" program, providing six months of complimentary access to its premium Claude Max 20x plan for qualified open-source maintainers. This initiative targets significant projects that have at least 5,000 GitHub stars or more than 1 million monthly npm downloads and show recent activity. By doing so, Anthropic aims to recognize developers' contributions and improve AI-assisted software development processes. The program also invites applications from vital infrastructure projects that do not meet the specified criteria but are deemed important by Anthropic. Despite this outreach effort, Anthropic maintains its language models as proprietary, signaling a strategic move to engage with the open-source community rather than an intent to release their technology publicly, which is unlikely due to intellectual property concerns, particularly regarding potential misuse by Chinese entities. This program underscores broader conversations about how AI companies should compensate for leveraging open-source projects in developing their models. Keywords: #phi4, AI, Access, Anthropic, Ban, Claude, Community, Developers, Distillation, Engagement, Feedback, Frontier AI, GitHub, Infrastructure, LLMs, Maintainers, Model, Open Source, Protocol, Security, npm

model context protocol

www.thedeepview.com 7 days ago
https://news.ycombinator.com/item?id=47178371 7 days ago

1721. HN TrustLoop – Real-time policy enforcement and audit logging for AI agents

TrustLoop is an advanced tool designed for real-time monitoring, control, and auditing of autonomous AI systems. It provides comprehensive logging capabilities, capturing all tool calls, arguments, results, timestamps, and context to ensure thorough oversight. A critical feature is the "kill switch," which can instantly halt any potentially dangerous actions before they are executed, enhancing safety. TrustLoop ensures the integrity of its audit logs by anchoring them on a blockchain, resulting in tamper-proof records that bolster trustworthiness. Users benefit from a visual dashboard that displays real-time data about AI operations, including those permitted and blocked. Built on the Model Context Protocol (MCP) standard, TrustLoop is compatible with various MCP-compatible clients like Claude Desktop, ensuring seamless integration across different platforms. This makes it an essential tool for maintaining robust oversight of AI activities. Keywords: #phi4, AI agents, Blockchain Anchoring, Claude Desktop, Kill Switch, MCP Protocol, Model Context Protocol, Real-Time Logging, TrustLoop, Visual Dashboard, audit logging, autonomous systems, context, control, hash logs, microsecond timestamps, monitor, real-time policy enforcement

model context protocol

www.trustloop.live 7 days ago

1725. HN Show HN: Network-AI – plug any AI framework into one atomic blackboard

Network-AI is a TypeScript/Node.js library crafted to resolve common challenges in multi-agent systems by establishing a coordination layer over various AI frameworks like LangChain, CrewAI, and AutoGen. It introduces an atomic blackboard system designed with propose→validate→commit operations, which effectively prevent race conditions and maintain consistency of shared states among parallel agents. The key features include a Coordination Layer that provides governance without confining users to specific frameworks; an Atomic Blackboard utilizing file-system mutexes for conflict-safe state management; an AuthGuardian that implements scoped permission tokens for sensitive operations; and a FederatedBudget that enforces per-agent token ceilings with live spend tracking capabilities. Additionally, Network-AI supports integration through Adapters compatible with 12 different frameworks, ensuring seamless adaptability. It also maintains transparency through an HMAC-signed Audit Log that records activities comprehensively. The library is designed to be extensible, eliminating the need for native dependencies or build steps. Network-AI caters to a diverse range of applications from simple orchestrators to intricate AI pipelines, promoting efficient resource management and secure operations across frameworks. It offers extensive documentation, robust testing suites, and detailed integration guides, making it an accessible tool for teams aiming to enhance their multi-agent systems. Keywords: #phi4, AuthGuardian, FederatedBudget, Network-AI, TypeScript/Nodejs, adapters, atomic blackboard, audit log, coordination layer, framework integration, multi-agent system, permission gating, propose-validate-commit, race conditions

model context protocol

github.com 7 days ago

1754. HN AgentOps and operationalizing AI agents for the enterprise

AgentOps is an emerging discipline aimed at managing the lifecycle of AI agents in production environments within enterprises, addressing challenges that arise from their operational use beyond experimental stages. With a significant number of companies already deploying AI agents as per G2's 2025 report, AgentOps extends DevOps and MLOps principles to focus on reliability, governance, security, and transparency, necessitated by the unique aspects of AI systems like non-deterministic behavior and autonomous tool usage. A proposed operational framework by Wang et al. includes stages such as monitoring, anomaly detection, root cause analysis, and resolution to manage these challenges effectively. Best practices for enterprise AgentOps include defining clear agent goals, establishing governance layers, ensuring flexible tool connectivity, managing the lifecycle, integrating human-in-the-loop processes, continuous optimization, cost control, standardization, and streamlined deployment. These practices aim to make AI agents trustworthy, efficient, and aligned with business objectives while meeting compliance requirements. The UiPath Platform exemplifies these principles by offering a trust and governance foundation through platform-level policies, identity management, data governance, and infrastructure controls. It facilitates pre-production simulations for confidence building and provides flexible tool connectivity via MCP servers. Lifecycle governance in UiPath ensures traceability of AI agents, with the Maestro control plane standardizing execution across agents. Human-in-the-loop patterns are integral to UiPath's approach, allowing human oversight through approvals and reviews. Additionally, continuous evaluation processes enable ongoing improvement of AI agents, complemented by cost management features to prevent excessive expenses. Overall, AgentOps is essential for transforming AI agents into a reliable enterprise capability, ensuring they function as governed assets within business processes with accountability, performance measurement, and ongoing enhancement. Keywords: #phi4, AI agents, AgentOps, UiPath Platform, auditability, continuous optimization, cost control, cost management, drift detection, enterprise, evaluation-driven development, governance, human-in-the-loop, lifecycle management, operational burdens, orchestration, production workloads, security, standardization, tool access control, transparency

model context protocol

www.uipath.com 8 days ago

1764. HN Understanding Model Context Protocol: Connecting Your Software to AI

The Model Context Protocol (MCP) serves as a pivotal framework designed to streamline communication between diverse software applications, especially for integrating AI agents. By enabling AI to access and automate tasks across various platforms, MCP represents an evolution in how software components interact, akin to the progression from desktop to web, and subsequently to mobile environments. Developed to address the necessity for standardization in AI tool interactions, MCP utilizes JSON-RPC endpoints to define these exchanges, supporting multiple transport layers such as "stdio" for local communications and HTTP streaming for remote access, with outputs like Markdown that are interpretable by AI models. A critical component of MCP is its formalized authentication process, which ensures secure access when interacting with protected resources or over the internet. This involves using OAuth bearer tokens derived through a dynamic client registration protocol, as supported by Prefactor—a platform dedicated to the secure and scalable implementation of MCP—which can integrate with existing providers. Future iterations of the MCP specification will introduce features like scopes and step-up authorization to enhance permission management, while long-term goals include refining metadata organization, internal enterprise authentication, and enabling autonomous agent operations without direct user involvement. For developers, adopting MCP is increasingly indispensable as it aligns with user expectations for AI-compatible software integration. The protocol's design emphasizes simplicity, facilitating initial implementation by exposing basic tools, incorporating OAuth to provide user context when necessary, and evolving auth mechanisms over time. Consequently, embracing MCP is not merely optional but essential for staying competitive within the rapidly changing landscape of software development and user engagement. Keywords: #phi4, AI agents, HTTP streaming, JSON-RPC, MCP server, Model Context Protocol, OAuth, agent framework, authentication, enterprise access, enterprise access Keywords: Model Context Protocol, scopes, software integration, step-up auth, tool calls

model context protocol

fusionauth.io 8 days ago

1794. HN Show HN: Mind-mem – Zero-infra agent memory with 19 MCP tools (BM25+vector+RRF)

"Mind-mem" is an advanced memory management tool designed for AI coding agents, offering zero-infrastructure agent memory through 19 Model-Connected Protocol (MCP) tools. It enhances AI assistants like Claude Code and OpenClaw by providing a governed Memory Operating System (OS). Key features include hybrid search methods combining BM25, vector search, and Reciprocal Rank Fusion (RRF), intent routing, contradiction detection, drift analysis, and comprehensive audit trails. The tool supports shared memory across multiple AI agents, ensuring decisions made in one client are instantly available to others, with a single installation script for easy configuration. "Mind-mem" introduces innovative techniques such as co-retrieval graphs, fact card sub-block indexing, adaptive knee cutoffs, hard negative mining, deterministic reranking, and an optional cross-encoder. It emphasizes local-first storage without cloud dependencies, using plain Markdown files for persistence. The tool surpasses competitors like Mem0 and Letta in benchmarks due to its hybrid retrieval system and governance features. The installation process is streamlined with an auto-detect script for various AI clients, while manual setup involves initializing workspaces and validating configurations. "Mind-mem" offers comprehensive commands for scanning, applying proposals, recalling queries, and managing multi-agent memory through namespaces and access controls. It operates efficiently on a SQLite FTS5 backend, ensuring fast query latencies. In addition to these capabilities, the system enhances search performance using BM25F scoring, Reciprocal Rank Fusion (RRF), deterministic reranking, among other techniques, achieving significant speedups with compiled kernels compared to pure Python implementations. The system includes kernel functions for scoring and boosting, a C99-compatible ABI for Python interaction via ctypes, and a fallback mechanism to pure Python if the compiled library is absent. The tool features multi-agent memory management with namespace setup and access control, conflict resolution tools, and backup capabilities. It offers different governance modes (`detect_only`, `propose`, `enforce`) with a recommended rollout plan, managed via `mind-mem.json` for configuration settings. The MCP server setup instructions are provided using fastmcp, along with various memory search and update proposal tools. Security is ensured through structural checks, no network calls, and filesystem security measures. Full platform support is available on Linux and macOS, while Windows requires WSL/Git Bash. Troubleshooting guidance addresses common issues like recall results not appearing, MCP connection failures, MIND kernel loading problems, and index corruption. The document concludes with references to contributing guidelines and notes the MIT license under which "Mind-mem" is distributed. Keywords: #phi4, ACL-based access control, AI coding agents, Access Control, BM25+vector+RRF, BM25F scoring, Claude Code, Confidence gating, Deterministic reranking, Evidence ranking, FFI Bridge, Hybrid fusion, Kernel Index, MCP tools, Mind-mem, Multi-Agent Memory, Namespace Setup, OpenClaw, Performance optimization, Platform Support, Reciprocal Rank Fusion, SQLite WAL mode, Safety Guarantees, Threat Model, adversarial abstention, agent memory, audit trail, contradiction detection, cross-encoder reranking, drift analysis, governance-aware, hybrid search, integrity checking, intent routing, persistent memory, structured persistence, workspace compaction, zero-infrastructure

model context protocol

github.com 8 days ago

1806. HN Show HN: Orkia – a Rust runtime where AI agents can't bypass governance

Orkia is an open-source runtime developed in Rust, specifically designed to deploy and manage Large Language Model (LLM) agents within enterprise environments. It emphasizes robust governance mechanisms that ensure compliance and security by incorporating features such as policy enforcement, trust scoring, audit trails, and sensitivity label tracking at the type-system level. This design guarantees that no tool execution can bypass these controls. Orkia supports integration with multiple LLM providers through native integrations and an OpenAI-compatible adapter. Central to its governance model is a fail-closed approach where agents are required to pass through a multi-stage pipeline before executing any tools, ensuring that only authorized actions are taken. Agents earn autonomy based on their behavior, which is quantified using trust scores that dictate the level of independence granted. Every action performed by an agent is logged in audit trails, resulting in SEAL documents that provide tamper-evident records for audits. The system implements monotone taint tracking to manage data sensitivity labels, ensuring that these labels accumulate but never decrease through tool interactions. It enforces a deny-all default policy where any labeled tool call without explicit permission is blocked. Orkia's autonomy levels and trust scoring are determined by weighted scores across various dimensions, including task completion, policy compliance, resource usage, and audit completeness. Trust is reset whenever configuration changes occur to ensure fresh evaluations of agent behavior. The architecture of Orkia comprises 27 Rust crates categorized into functional groups such as governance orchestration, tool handling, message persistence, etc., with Docker container isolation for enhanced security. It features a live dashboard for governance monitoring. Key features include support for over 13 LLM providers, a multi-strategy RAG pipeline for information processing, OCI artifact distribution for agent bundle management, and event-driven activation through triggers. Configuration is managed via YAML files, and the system offers a comprehensive command-line interface (CLI) that includes commands for running agents, managing sessions, and more. Security is further bolstered by manifest signing for verification workflows. Orkia also supports development with an integrated test framework to validate agent behavior within CI/CD pipelines. The project is actively developed under the Apache License 2.0, ensuring broad accessibility and contribution potential from the community. Keywords: #phi4, ATLAS, Apache License 20, CI/CD pipeline, Docker containers, GitHub Action, LLM agents, LLM providers, OCI artifacts, Obelisk, Orkia, RAG pipeline, Rust, SEAL evidence, SEAL verification, YAML configuration, adversarial scenarios, audit trails, autonomy levels, container isolation, event-driven triggers, governance, governance dashboard, loop guard, manifest signing, microVMs, policy compliance, policy enforcement, resource usageKeywords: Orkia, sensitivity labels, trust persistence, trust scoring

model context protocol

github.com 8 days ago

1878. HN From Abilities to AI Agents: Introducing the WordPress MCP Adapter

The article discusses the introduction of the WordPress MCP (Model Context Protocol) Adapter in WordPress 6.9, a feature designed to enhance AI automation and workflows by enabling standardized functionalities within WordPress through the Abilities API. This adapter allows AI tools secure access to execute WordPress abilities, transforming them into contextually aware actions for generative AI models accessing site data. Key features of this system include its integration with generative AI, where developers provide necessary context for AI interactions, and the MCP Adapter itself, which converts registered abilities into compatible tools for execution or data reading by AI agents. The adapter is accessible as a plugin offering default abilities for testing purposes, requiring developers to designate these abilities as public using `wp_register_ability()`. It supports different transport mechanisms, such as STDIO for local environments and HTTP for remote connections, with configuration examples provided for integration with applications like Claude Desktop and VS Code. Additionally, the article highlights the ability for developers to create custom MCP servers tailored to specific plugins, granting them control over which abilities are exposed. Security is a significant consideration in using this adapter, emphasizing cautious implementation of `permission_callback`, the use of dedicated users for secure access, and vigilant monitoring of activity. The article encourages WordPress developers to begin experimenting by registering simple abilities and connecting with local AI clients, progressively expanding their capabilities as they become more familiar with the system. Overall, the initiative seeks to empower developers within the WordPress ecosystem to build innovative AI-assisted tools and workflows, ultimately enhancing productivity and fostering innovation. Keywords: #phi4, AI Agents, Abilities API, Authentication, Debugging, Generative AI, MCP Adapter, Observability, Permissions, Plugins, Security, Transport Methods, WordPress

model context protocol

developer.wordpress.org 8 days ago

1883. HN SDK code mode shows SotA accuracy and performance for agents using APIs

SDK code mode is a sophisticated approach that enhances the integration capabilities of AI agents using the Model Context Protocol (MCP) by employing API-specific Software Development Kits (SDKs). This method addresses significant challenges in complex API integrations, such as token inefficiency and security issues, which have traditionally limited MCP's effectiveness. By allowing models to generate idiomatic code complete with comprehensive documentation and type checking, SDK code mode significantly improves the accuracy of producing intricate API interactions within fewer steps. A key advantage of this approach is its ability to perform multiple tasks within a single context window without additional token consumption, leveraging the model’s coding proficiency for high fidelity feedback through API-specific error messages. This reduces debugging time and boosts efficiency. Stainless, an expert in this field, demonstrated the superiority of SDK code mode using evals with the Increase Banking API, where it outperformed other MCP configurations like those from Cloudflare and Anthropic in terms of completeness, efficiency, and factual accuracy. The method is particularly advantageous for transaction-heavy tasks where traditional MCP servers struggle due to token inefficiency and limited precision. The success of SDK code mode suggests its potential for broader application across various APIs, encouraging developers to reconsider their reliance on conventional MCP strategies with this advanced technique, thereby optimizing integration processes in AI-driven environments. Keywords: #phi4, API, Anthropic, Claude Opus, Cloudflare, MCP, SDK, Stainless, accuracy, banking API, completeness, documentation search, efficiency, factuality, token efficiency, tool execution, transaction-heavy tasks

model context protocol

www.stainless.com 8 days ago

1933. HN AI Authentication and Authorization

The article explores the significance of human identity in controlling AI's authority, particularly within authentication and authorization frameworks, suggesting that methodologies from the 2010s API boom remain relevant for modern AI security. It outlines three distinct use cases: retrieval-augmented generation (RAG), tool interaction through Model Context Protocol (MCP) and APIs, and agentic systems. In RAG scenarios, emphasis is placed on ensuring AI models access only permitted documents by authenticating users and filtering document permissions using frameworks like LangChain for secure retrieval. When discussing tool use with MCP and APIs, the article advocates leveraging OAuth 2.1 for authentication in MCP while reapplying traditional API security methods. Agentic systems are examined through their autonomous workflows that execute tasks on behalf of humans, where maintaining identity via JWTs and audit trails is crucial to track authorization across multiple steps. The author recommends established practices such as OAuth and deterministic enforcement within AI systems, highlighting the necessity for evolving standards like MCP. Core principles emphasized include placing human identity at the center, ensuring deterministic enforcement, and adopting a layered defense strategy to enhance security in AI applications. Keywords: #phi4, AI Authentication, APIs, Access Tokens, Audit Logs, Authorization, FusionAuth, Identity Management, JWTs, OAuth, RAG, Role-Based Access Control, Vector Database

model context protocol

fusionauth.io 8 days ago

1952. HN Show HN: OmniGlass – Executable AI screen snips with kernel-level sandboxing

OmniGlass is an AI-powered productivity tool that enables users to execute actions directly from screen captures by providing actionable menus based on the content within screenshots. Unlike typical tools generating chat responses, OmniGlass offers specific functionalities such as automatically fixing Python errors, saving data tables as CSV files, and creating GitHub issues from Slack reports. Emphasizing security, it employs kernel-level sandboxing on macOS to safeguard user data, preventing plugins from accessing sensitive information without explicit permission. The platform supports a plugin system via the Model Context Protocol (MCP), encouraging users to extend its capabilities by developing custom actions. OmniGlass is open source and operates locally, utilizing Apple Vision OCR for text extraction while supporting various AI models like Claude Haiku, Gemini Flash, and Qwen-2.5. It challenges developers to test its sandboxing security features and fosters community involvement in plugin development and expanding the platform to Windows and Linux. The project actively seeks feedback and contributions from users through discussions, a developer guide for creating plugins, and an open-source license under MIT, promoting collaborative growth and innovation. Keywords: #phi4, AI, GitHub Issues, MIT License, Nodejs, OCR, OmniGlass, Rust, Slack Webhook, Tauri, macOS, plugins, sandboxing, security

model context protocol

github.com 8 days ago

1954. HN Show HN: Open-Source Postman for MCP

"Show HN: Open-Source Postman for MCP" presents an innovative open-source desktop GUI aimed at enhancing development and testing workflows for Model Context Protocol (MCP) servers by providing a user-friendly visual interface. This tool effectively addresses the complexities associated with MCP usage by supporting multiple transport protocols such as stdio, HTTP, and SSE. Key features include multi-transport support, enabling users to manage various communication channels seamlessly; a schema inspector that displays JSON schemas and utilizes auto-generated forms for input; an AI-powered feature called "AI Auto-Select" which interprets plain English descriptions to facilitate tool selection and argument configuration; request history functionality that records requests in a SQLite database with the convenience of one-click replay; and a dark mode interface designed for visual comfort. The project resolves significant challenges traditionally faced when testing MCP servers, such as the absence of visual tools for schema inspection, limited support for non-HTTP transports, and the need for efficient request management. By providing these comprehensive features, it significantly enhances productivity and minimizes manual efforts in development workflows. To get started with this open-source project, users can clone the repository via `git` and leverage `npm` commands to install necessary dependencies before running the application. It supports easy connections to both stdio and HTTP MCP servers through intuitive interfaces for tool exploration, parameter configuration, and request execution. The technical foundation of the project is robust, leveraging modern technologies such as Next.js 15, React 19, Tailwind CSS, Prisma with SQLite, and the Anthropic SDK for AI capabilities. The application's architecture includes essential components like a sidebar for navigating tools, a dedicated request builder interface, and an API route management system. The roadmap for future development includes several enhancements like support for exporting request collections, environment variable configurations, batch requests, syntax highlighting, and eventually creating a desktop application. Open to community contributions, the project invites participation in areas such as SSE transport integration, improving error messaging, among other aspects. Released under the MIT license, this tool aims to establish itself as the standard testing utility for MCP servers. Keywords: #phi4, AI auto-select, API Routes, Anthropic SDK, CLI commands, Electron/Tauri, HTTP-only tools, JSON-RPC, MCP, MIT License, Nextjs, Open-Source, Postman, Prisma, React, SQLite, Tailwind CSS, TypeScript, devtools, environment variables, error messages, multi-transport support, request diff/comparison view, request history, schema inspector

model context protocol

github.com 8 days ago

1968. HN Show HN: I used an IoT sensor and Claude to diagnose a hairdryer

The project presents an IoT sensor-based system leveraging large language models (LLMs) such as Claude to facilitate predictive maintenance of machinery, notably hairdryers. It innovatively replaces traditional software with a natural language interface that orchestrates tasks like data acquisition and analysis through interconnected tools, enhancing accessibility and making diagnostics conversational. Within this system, AI agents perform diagnostics on bearing faults using vibration data analyzed by techniques such as envelope analysis via the Hilbert transform. These analyses pinpoint characteristic frequencies linked to various bearing defects, including outer race, inner race, rolling elements, and cage issues, along with providing confidence levels for each detection. The setup incorporates STEVAL-STWINBX1 edge sensors for gathering physical data, local servers known as Model Context Protocols (MCP) for processing this information, and a cloud-based Claude system for reasoning. The MCP framework allows LLMs to interact programmatically with external tools through two distinct MCP servers: one dedicated to sensor communication and another to vibration analysis tasks. The agentic maintenance approach employs specialized AI agents—Monitoring, Diagnosis, Reporting—which coordinate their activities via natural language using Claude Skills that define workflows such as data acquisition, fault diagnosis, and report generation. This system is capable of identifying a range of faults including unbalance, misalignment, mechanical looseness, and specific bearing defects. It provides confidence levels for each detection and classifies findings according to ISO 10816 severity standards. Consequently, operators can conduct predictive maintenance efficiently without requiring specialized knowledge in signal processing or vibration analysis. Keywords: #phi4, AI agents, Diagnosis Skill, FFT, Hilbert transform, ISO 10816, IoT sensor, MCP servers, Monitoring Skill, Reporting Skill, STEVAL-STWINBX1, agentic maintenance, bearing faults, confidence levels, conversational, diagnostics, edge sensors, envelope analysis, fault detection, large language models, machine condition monitoring, natural language, predictive maintenance, vibration data

model context protocol

lgdimaggio.github.io 8 days ago

1977. HN SDK code mode shows SotA accuracy for operating APIs via MCP

SDK code mode represents a significant advancement in enhancing the interaction between AI agents and complex APIs through the utilization of Model Context Protocol (MCP) combined with specific Software Development Kits (SDKs). This approach addresses prevalent challenges such as token inefficiency and security concerns that previously limited MCP's effectiveness in API integration. By allowing AI models to write direct code for API-specific tasks, SDK code mode improves both the accuracy and efficiency of these interactions. The implementation leverages idiomatic SDKs and extensive documentation, facilitating the generation of effective code with pertinent error feedback. Stainless' application of this method on the Increase Banking API highlights its superiority over other methods such as Anthropic Code Mode, Cloudflare's code execution, and dynamic endpoint discovery. It boasts near-perfect task completion rates and high efficiency, although factuality remains an area for further enhancement. A critical success factor for Stainless is its reliable access to complete datasets, which minimizes erroneous or incomplete results and reduces the volume of unnecessary data returned by models. This method merges efficient tool design with comprehensive documentation, illustrating a substantial potential for improving AI API integration performance. The promising outcomes encourage ongoing experimentation and broader adoption across various APIs, underscoring SDK code mode's transformative impact on AI-driven API interactions. Keywords: #phi4, API, Anthropic, Cloudflare, MCP, SDK, SDKs, Stainless, accuracy, banking API, code execution, documentation search, token efficiency, tool calling

model context protocol

www.stainless.com 8 days ago

2012. HN Show HN: Memgraph-agent – NER+PageRank memory for AI agents, $0 LLM cost

Memgraph-agent represents an innovative graph-powered memory system designed to optimize AI agent capabilities by integrating Named Entity Recognition (NER) and Personalized PageRank algorithms, offering a zero-cost alternative to traditional language model-based systems. It constructs a co-occurrence graph from the agent's memories using NER, custom dictionaries, and regex for efficient entity extraction, which allows knowledge retrieval through connections rather than simple keyword matching. This system stands out by avoiding the high costs associated with language model (LLM) token usage, utilizing CPU-based processing to achieve 28% faster retrieval compared to pure vector search methods. The architecture of Memgraph-agent involves using spaCy and other tools for entity extraction, storing results in a NetworkX DiGraph, and supporting both graph and vector storage. It employs hybrid retrieval combining Personalized PageRank with vector similarity, facilitating multi-hop reasoning across knowledge graphs. Unlike traditional systems that rely solely on vector similarity, Memgraph-agent offers additional features like community detection and path explanations. Memgraph-agent is versatile for use cases such as easy installation via Python libraries and seamless integration into existing workflows for memory ingestion and query retrieval. It also provides command-line utilities for graph construction, searching, visualization, and data exporting. Inspired by research indicating the effectiveness of NER-based graph construction over LLMs, the project aligns with advancements in AI memory systems such as those explored in SPRIG and GraphRAG papers. The roadmap for Memgraph-agent includes plans to support multi-language entity extraction, integration with Neo4j for large-scale deployments, and the development of a REST API. As an open-source initiative licensed under the MIT License, it encourages community engagement through contributions that enhance its features further. Keywords: #phi4, AI agents, CPU-only, ChromaDB, Louvain Modularity, MCP server, Memgraph-agent, NER, Neo4j, NetworkX DiGraph, PageRank, Personalized PageRank, REST API, community detection, entity extraction, graph-powered memory, hybrid fusion, incremental updates, interactive visualization, knowledge graph, pyvis, spaCy, vector similarity, zero LLM cost

model context protocol

github.com 9 days ago

2065. HN Show HN: Prvctice,A personal OS I built solo that generates its own apps

Prvctice is an innovative personal operating system developed over 14 months by Tim Moore. Initially conceived as a research tool for managing sources outside traditional content feeds, it transformed into a DIY OS designed to facilitate creative workflows. The OS distinguishes itself with several key features: its Recursive Learning System tracks and re-ranks tools based on user habits; the Intent Coordinator integrates diverse input methods—such as game controllers, MIDI devices, gestures, and voice—without hard-wiring specifics; and it offers a built-in App SDK that generates apps like calendars and study timers automatically from observed user behavior. Technically, Prvctice is built using Vue 3 and Pinia for its frontend framework, while Node.js with Express powers the backend. It leverages Three.js to handle graphics and supports various input sources through MediaPipe's gesture and hand-tracking capabilities. The system utilizes IndexedDB and SQLite for storage solutions. As an open-source project under the Apache 2.0 license, Prvctice encourages global contributions and is supported by comprehensive documentation that covers setup processes, skill development, app creation, and understanding of its architecture. Prvctice stands out as a flexible, privacy-centric OS with a focus on enhancing creative workflows through automation and seamless integration of multiple input methods. Keywords: #phi4, AI, Apache 20, Creative Director, DIY, Electron, IndexedDB, OS, Prvctice, SDK, Threejs, Tim MooreKeywords: OS, Vue 3, apps, intent coordinator, knowledge graphs, open source, recursive learning

model context protocol

github.com 9 days ago

2092. HN I Changed My Mind About MCP

The author initially resisted the Model Context Protocol (MCP) but has come to appreciate its role in organizing capabilities for autonomous agents within enterprises. Though MCP isn't groundbreaking compared to prior protocols, it effectively encourages integration providers to standardize capability packaging for agent use. The author emphasizes integrating MCP servers into a service mesh, allowing existing enterprise policy and monitoring systems like OPA and Grafana to be utilized without substantial modifications. This configuration enables agents to access capabilities using simple tools such as `curl` within the service mesh, which reduces dependency on tool-specific interfaces while retaining CLI efficiency where appropriate. The author proposes a three-tier architecture that consists of APIs for atomic operations, MCPs for stateful workflows tailored to agents, and CLIs for human-accessible interfaces. MCP servers simplify agent interactions by offering streamlined "wizard-like" pathways for managing workflow states internally, which eases tasks like handling TODO lists without overburdening the agent with complex state management. This minimizes token usage and reduces error risks. Employing a service mesh to provide these capabilities aligns well with zero trust architecture principles, bolstering security through network-level control and policy enforcement. Ultimately, MCP's significance lies in its ability to prompt industry-wide consideration of capability interfaces for AI agents, representing a fundamental shift in mindset rather than any technical novelty. Keywords: #phi4, Agent Frameworks, CLI, Capabilities Packaging, Context, Interface Shape, JSON-RPC, MCP, Model, Network Security, Protocol, Service Mesh, Stateful Interfaces, Tool Definitions, Workflows, Zero Trust Architecture

model context protocol

sibylline.dev 9 days ago

2117. HN Next.js 16 vs Tanstack Start (2026): Performance, Memory Leaks and Migration

In 2026, a comparative analysis between Next.js 16 and TanStack Start highlights their respective strengths in developing live SaaS systems, focusing on key factors such as performance, memory management, and migration considerations. The landscape is divided into two camps: integrated platforms like Next.js, which offer tight coupling with robust features, versus composable primitives like TanStack Start that emphasize flexibility and portability. This benchmarking study presents unexpected insights, revealing both the advantages and challenges of each framework. Next.js 16 provides a powerful environment but encounters certain hurdles, including slower development speeds due to its complex App Router architecture, initial route loading times ranging from 10-12 seconds owing to React Server Components (RSC) overhead, and memory leaks that can result in Out Of Memory Killed (OOMKilled) errors within Kubernetes setups. Despite these issues, it remains a viable option for production with available patches addressing known vulnerabilities. Conversely, TanStack Start simplifies the development process using Vite alongside TanStack Router + Query, significantly enhancing server start-up times to just 2-3 seconds and reducing overhead through an explicit routing model. While its ecosystem is not as mature as Next.js’s, its stability is evidenced by successful real-world applications, making it a compelling choice for businesses. Ultimately, the decision between Next.js 16 and TanStack Start hinges on specific business needs: enterprises requiring Incremental Static Regeneration (ISR) and edge caching with clear vendor SLAs might favor Next.js, while those prioritizing rapid development cycles and ease of use may lean towards TanStack Start. The trend toward explicit frameworks like TanStack Start also supports AI-assisted tooling and multi-cloud deployment strategies, aligning with broader architectural goals rather than just immediate performance improvements. Keywords: #phi4, AI-native tooling, CVE-2025-55182, Kubernetes, Model Context Protocol (MCP), Nextjs, OOMKilled, React Server Components (RSC), TanStack Start, Vite, deployment portability, development speed, ecosystem maturity, explicit routing, infrastructure, memory leaks, migration, multi-cloud, performance, production risk, security surface, vendor lock-in

model context protocol

  beyondit.blog 9 days ago
   https://nextjs.org/blog/next-16-1#turbopack-file-system   8 days ago
   https://nextjs.org/docs/app/guides/memory-usa   8 days ago
   https://github.com/leerob/next-self-host   8 days ago

2120. HN MCP Servers Are the New NPM Packages

MCP (Model Context Protocol) servers are increasingly integral to AI agents as they provide plug-in capabilities akin to npm packages in software development. These servers enhance agent functionality by facilitating access to a variety of tools and resources, but they also introduce significant security risks due to their potential influence over agent behavior through untrusted tool descriptions. A primary concern is "tool poisoning," where malicious MCP server descriptions can manipulate an agent's actions without exploiting traditional vulnerabilities. The absence of trust boundaries between different servers exacerbates this risk, leading to possible cross-server contamination and broader system compromise, much like npm supply chain attacks but with potentially more severe consequences due to the advanced capabilities of AI agents. Unlike conventional security measures that vet code during installation or connection time, MCP lacks a robust trust model for server interactions. This deficiency makes it susceptible to prompt injection and other manipulations. To mitigate these threats, a proposed solution is per-syscall evaluation. This approach involves independently assessing each operation triggered by an agent against security filters, irrespective of its source from an MCP server. Implementing this mechanism at the OS level would enable interception and blocking of harmful actions resulting from poisoned tool descriptions or manipulated responses, thereby safeguarding the expanding MCP ecosystem against emerging threats. Keywords: #phi4, Boundaries, Contamination, Cross-Server Contamination, Description, Execution, Execution Layer Keywords: MCP, Injection, MCP Servers, Model Context Protocol, NPM, NPM Packages, Packages, Per-Syscall Evaluation, Poisoned, Poisoned Tools, Prompt Injection, Protocol, Proxy, Risks, Security, Security Proxy, Security Risks, Servers, Supply Chain, Supply Chain Attacks, Syscall, Tool Descriptions, Tools, Trust, Trust Boundaries

model context protocol

grith.ai 9 days ago

2148. HN Model Context Protocol works for tools. It breaks for agents

The document compares the Model Context Protocol (MCP) utilized by Claude Code with OpenCode's plugin model, highlighting their distinct functionalities and limitations. MCP functions over JSON-RPC 2.0 using stdio as a tool integration layer where plugins operate as isolated processes communicating via pipes. This design is straightforward and supports multiple programming languages but falls short in providing lifecycle hooks or shared states, which complicates the orchestration of complex agents. Consequently, it is more appropriate for simpler tools such as session sharers or scrapers. In contrast, OpenCode allows direct, in-process plugins with extensive lifecycle hooks, shared state management, and deterministic dispatch. This model facilitates deeper integration within its runtime environment, making it better suited for constructing intricate agent systems that require seamless coordination across various agents and tasks. However, OpenCode has limitations regarding cross-editor portability and is restricted to JavaScript/TypeScript language support. The text underscores the inadequacies of both models: Claude Code's MCP faces challenges with non-deterministic tool dispatch due to a lack of hooks or shared state for plugins, whereas OpenCode struggles with broader editor compatibility and limited language flexibility. An optimal solution would combine these approaches by enabling portable tools through MCP while allowing in-depth integration via direct plugins, a hybrid capability neither platform currently offers comprehensively. Keywords: #phi4, Claude Code, JSON-RPC, MCP server, Model Context Protocol, OpenCode, agent systems, architecture, dispatch, lifecycle hooks, plugins, process isolation, session extraction, state sharing

model context protocol

blog.vtemian.com 9 days ago

2156. HN RAG vs. Skill vs. MCP vs. RLM

The article delves into four advanced techniques designed to enhance the capabilities of Large Language Models (LLMs) beyond their inherent generalist functions: RAG, SKILL, MCP, and RLM, each addressing distinct limitations while offering unique advantages. **RAG (Retrieval-Augmented Generation)** enhances LLMs by integrating an external lookup mechanism that extends the context window through a searchable knowledge base of text vectors, thus allowing for more informed responses to user prompts based on static or slowly changing data, though it falls short in handling real-time or multi-step reasoning tasks. **SKILL (Dynamic Capability Loading)** introduces dynamic capability loading akin to software libraries, enabling LLMs to load specific functionalities as needed, optimizing token usage particularly in complex tool-driven workflows, but it is not suited for applications requiring low latency. **MCP (Model Context Protocol)** provides a structured client-server framework that standardizes interactions between LLMs and external systems such as databases or SaaS platforms, ensuring secure and reusable integration of prompts and functions, though its structural rigidity may introduce complexity and latency. Lastly, **RLM (Recursive Language Models)** allows LLMs to process large datasets by treating them as environment variables, facilitating tasks that demand extensive contextual comprehension like legal document analysis or code refactoring, but this method can lead to non-deterministic processing paths and increased latency. The author invites readers to share the insights and offers paid subscriptions for further resources, acknowledging the effort invested in producing such content. Keywords: #phi4, Dynamic Capability Loading, Just-In-Time dependency injection, LLMs, MCP, Model Context Protocol, RAG, RLM, Recursive Language Models, Retrieval-Augmented Generation, Skill, embedding model, sandboxed REPL environment, vector database

model context protocol

  blog.alexewerlof.com 9 days ago
   https://philippdubach.com/posts/dont-go-monolithic-the-   8 days ago
   https://philippdubach.com/posts/beyond-vector-search-wh   8 days ago

2168. HN How to Write a Good Spec for AI Agents

To create effective specifications for AI agents in software development, it's crucial to maintain clear, concise documents that guide the AI without overwhelming it. This involves five key principles: starting with a high-level vision that outlines broad objectives and allows AI to detail planning; structuring the specification like a professional product requirement document (PRD) or system specification (SRS) to include commands, testing procedures, project structure, and constraints in specific formats for clarity; breaking tasks into modular prompts to maintain focus; integrating self-checks with three-tiered guidelines and leveraging human expertise by embedding domain knowledge; and adopting iterative testing and tools to continuously refine the AI's output against the specifications. The central idea is that a well-managed specification acts as an evolving artifact, essential for ensuring quality outputs through precise instructions. Key points emphasize that while AI can efficiently execute tasks, users must ensure its outputs meet both technical and subjective criteria, acting as the final judge of quality. Spec writing should be iterative, refined by continuous testing and feedback, with automated tests verifying adherence to specifications. Effective context management is crucial, using tools like retrieval-augmented generation (RAG) or Model Context Protocol (MCP) to manage AI's focus without overwhelming it. Managing parallel tasks in version control systems helps avoid conflicts and maintain alignment between specifications and code outputs. Cost efficiency should guide model selection, balancing speed with complexity for different project phases. Monitoring all actions and outcomes is vital to identify deviations or errors, using insights gained to improve future processes. Developers must avoid common pitfalls like vague prompts, inadequate context management, lack of human review, and neglecting rigorous engineering practices when moving from prototyping to production. Ultimately, these principles ensure that AI agents can effectively support coding tasks, aligning with project goals while minimizing errors and inefficiencies. The dynamic specification evolves alongside the project, fostering successful collaboration between humans and AI in software development. Keywords: #phi4, AI agents, PRD, SRS, antipatterns, constraints, context window, continuous testing, cost considerations, domain knowledge, high-level vision, iterative testing, learning improvement, logging, modularity, parallelization, planning-first, quality filter, self-checks, spec writing, tool integration, verification steps, version control

model context protocol

www.oreilly.com 9 days ago

2219. HN AI agent with 2 deps that uses Shannon Entropy to decide when to act vs. ask

Picoagent has introduced several enhancements aimed at improving its efficiency, reliability, and adaptability as a lightweight AI assistant designed for mathematical tool-routing and safety. Among the notable updates are improvements to market query handling, ensuring cryptocurrency prices like "BTC price today" are fetched through a CoinGecko lookup path. The gateway cron execution has been refined to respect a configured `cron_file` with normalized arguments, enhancing reliability. Memory queries now return deterministic local file paths with preview snippets for consistent responses. The agent supports multi-turn tool chains that can automatically link up to three tools without additional user input, utilizing entropy scoring for each result before proceeding. Tool executions are safeguarded by a 30-second timeout, which is configurable, preventing indefinite hangs and ensuring efficiency through a 60-second caching system for successful results. Extensibility has been bolstered with the introduction of plugin hooks in `picoagent/hooks.py`, allowing custom interactions at different stages of execution. Skill management features have been enhanced with commands for direct GitHub-based skill installation and on-the-fly reloading using SIGHUP, alongside tracking usage in a JSONL file. Skills can declare dependencies for automatic loading, streamlining operations. Workspace security is heightened through the sandboxing of built-in tools like FileTool and ShellTool. The agent consolidates long conversations into searchable markdown files to facilitate easier access. The entropy-gating engine now calculates Shannon Entropy and TF-IDF scores locally, reducing uncertainty in tool execution decisions. Full compatibility with nanobot-style Markdown templates has been introduced, providing flexibility for users. Finally, maintenance commands such as `doctor`, `prune-memory`, and `threshold-stats` have been added to the CLI, along with support for Docker deployment and configuration options for running picoagent as a systemd user service. These updates collectively enhance picoagent's robustness, security, and versatility across various applications. Keywords: #phi4, AI assistant, CoinGecko, Docker deployment, MIT license, Markdown templates, Model Context Protocol (MCP), Shannon Entropy, chat apps, configuration, cron execution, crypto price queries, dependencies, dual-layer memory, entropy scoring, entropy-gating engine, gateway, hot-reload, lightweight, local automation, mathematical tool-routing, memory hardening, multi-turn tool chains, picoagent, plugin hooks, providers, result caching, roadmap, safety, sandboxing, skill install, systemd service, telemetry, timeout protection, vector memory, workspace sandboxing

model context protocol

github.com 9 days ago
https://github.com/borhen68/picoagents 9 days ago

2227. HN Show HN: Joey – MCP client that runs on your phone

Joey is an AI-powered mobile chat client designed for seamless interaction with remote Model Context Protocol (MCP) servers via OpenRouter, emphasizing privacy by operating directly on user devices without collecting telemetry data or requiring a subscription. It supports extensive MCP features such as tool calling, sampling, elicitation, OAuth, and session resumption, allowing users to connect with various AI models like GPT-4o, Claude, Gemini, and Llama mid-conversation while tracking usage costs. Joey enhances the user experience by automating tasks through an agentic loop where tools execute until completion, and it supports image and audio attachments. The app delivers a robust chat experience with capabilities such as streaming responses, markdown rendering, message editing, and search functionality. As an open-source project under the FSL-1.1-MIT license, Joey can be built using the standard Flutter SDK, making it accessible for further development and customization. Keywords: #phi4, AI chat client, AI models, Claude, Flutter app, GPT-4o, Gemini, GitHub, HTTP, Joey, Llama, MCP client, OAuth, OpenRouter, agentic loop, audio recordings, elicitation, full-text search Extracted Keywords: Joey, full-text search Final Keywords: Joey, full-text search Keywords: Joey, image attachments, markdown rendering, message editing, message editing Comma-separated List: Joey, message editing Final Answer: Joey, message editing Final Comma-separated List: Joey, message editing Final Keywords: Joey, message editing Simplified Keywords: Joey, mobile device, progress notifications, remote servers, sampling, session resumption, tool calling

model context protocol

benkaiser.github.io 9 days ago

2230. HN Show HN: Code-Graph-RAG – Knowledge graph RAG for any codebase

Code-Graph-RAG is a sophisticated Retrieval-Augmented Generation (RAG) system that specializes in analyzing multi-language codebases by constructing comprehensive knowledge graphs, thereby enabling natural language querying. The system employs Tree-sitter for parsing Abstract Syntax Trees (ASTs), ensuring robust support across various programming languages such as C++, Java, JavaScript, Python, Rust, TypeScript, and more. Its architecture integrates a multi-language parser with a RAG mechanism that interacts seamlessly with Memgraph, facilitating interactive CLI operations and real-time updates to the knowledge graph in active development environments. Key features of Code-Graph-RAG include support for multiple programming languages with future expansion plans, storage of code structures as interconnected graphs using Memgraph, natural language querying capabilities via AI models from providers like Google, OpenAI, and Ollama, semantic search functionality enabling intent-based discovery of functions, surgical editing with visual diff previews and AST targeting, and AI-driven optimization suggestions based on best practices and user-provided references. Recent enhancements include integration as an MCP server for Claude Code, which allows direct natural language queries, and the addition of UniXcoder embeddings for improved semantic code search. For installation and usage, the system requires Python 3.12+, Docker, cmake, ripgrep, and optionally Ollama or a Google Gemini API key. Users must clone the repository, set environment variables, and configure language models to operate in various modes such as parsing, querying, exporting, analyzing, optimizing, and editing codebases. Configuration is managed via an environment file supporting different AI model providers for both orchestrator tasks and Cypher queries, with custom ignore patterns specified through a `.cgrignore` file. The project encourages community contributions, detailing guidelines in CONTRIBUTING.md, and supports building binaries using PyInstaller along with debugging steps for common issues related to Memgraph, Docker, or Ollama connections. It also offers guidance on managing custom language grammars via `cgr`, which automates the setup of Tree-Sitter grammars hosted externally by cloning repositories and configuring necessary details. In addition to its open-source availability, Code-Graph-RAG provides enterprise solutions for cloud-hosted or on-premise deployments tailored to organizations seeking advanced services. Further resources, such as contributing guidelines, support options, plans, and pricing information, are accessible through the project's website. Keywords: #phi4, AI-Powered Optimization, AST Parsing, Code-Graph-RAG, Codebase Structure, Configuration Management, Custom Grammar Repositories, Cypher Generation, Data Sovereignty, Dependency Analysis, Diff-Match-Patch, Docker Containers, Graph Schema, Interactive CLI, Knowledge Graph, LanguageConfig, MCP Server Integration, Memgraph, Model Context Protocol, Multi-Language Support, Natural Language Querying, Ollama, PyInstaller, Real-Time Updates, Retrieval-Augmented Generation, Semantic Code Search, Shell Command Execution, Surgical Editing, Tree-sitter

model context protocol

github.com 9 days ago
https://docs.code-graph-rag.com 9 days ago

2233. HN Drop the Backpack: What $900/Day in AI Costs Taught Us About MCP

The document critically examines inefficiencies in using Model Context Protocol (MCP) within AI systems, particularly focusing on the financial burdens stemming from high token usage. The authors illustrate their experiences with LuumenAI, an AI application supporting ERP system monitoring, where they encountered steep cost increases due to suboptimal MCP practices like loading unnecessary tool definitions and iterative context accumulation. The key issues identified include: **Tool Definitions**, where all tool descriptions are redundantly included in every request, unnecessarily inflating token counts; **Iterative Context Growth**, where each tool interaction adds results back into the AI's context, leading to excessive token consumption; and the **"Lost in the Middle" Problem**, where large context windows obscure relevant data, degrading model performance. Although Anthropic introduced features like dynamic tool loading and code execution, these only partially address the inefficiencies inherent in MCP architecture. The solution proposed involves shifting from traditional MCP tools to a "Code Execution" approach, where AI generates scripts (in TypeScript or Python) for direct API interaction. This reduces context size by focusing on final results and significantly cuts down token usage and associated costs. By adopting this method, LuumenAI achieved improved efficiency, reducing daily costs dramatically during testing phases while enhancing scalability. The authors recommend designing AI systems with code execution in mind from the start, advocating for architectural strategies that effectively manage token consumption and boost performance, as demonstrated by their successful implementation at LuumenAI. Keywords: #phi4, AI, API, Anthropic, Byte-Pair Encoding (BPE), Claude, Haiku, MCP, Python, Sonnet, TypeScript, V8 isolates, caching, code execution, context, cost, dynamic tooling, efficiency, inference overhead, multi-step processing, observability, primacy and recency biases, programmatic calling, sandbox, tokens, tooling

model context protocol

www.apiphani.io 9 days ago

2238. HN Show HN: SwarmClaw – Orchestration dashboard for OpenClaw and AI agents

SwarmClaw is an advanced self-hosted dashboard designed to orchestrate multiple AI agents across various providers through a user-friendly mobile interface. It streamlines agent management with features such as task scheduling, chat platform integration, and secure data handling practices. The system supports 15 integrated AI providers like OpenAI and Anthropic, along with the capability to add custom endpoints compatible with OpenAI's API. Users can tailor agents by assigning traits, managing permissions, tools, and skills via an agent inspector panel, ensuring precise control over each entity’s behavior. SwarmClaw offers sophisticated orchestration and execution capabilities through multi-agent workflows powered by LangGraph and autonomous action loops, including task tracking, logging, memory management, and cost monitoring. Security is a primary focus with measures like access key authentication, TLS encryption via reverse proxies, rate-limiting to thwart failed access attempts, and encrypted storage for sensitive information. The platform further facilitates agent interaction with various chat platforms such as Discord, Slack, and WhatsApp, ensuring media-awareness in communication tasks. Setting up SwarmClaw requires Node.js 22.6+ and npm 10+, with installation options through npm or a custom script using `curl`, catering to both technical users and those preferring local execution without extensive setup knowledge. Configuration involves creating an access key and setting provider credentials, compatible with CLI providers like Claude Code CLI. Deployment can be achieved directly on a VPS using tools such as PM2 and Caddy or via Docker for simplified installation and updates. The platform’s development includes automatic update checks and command-line management interfaces, supported by a structured release process automated through GitHub Actions. Licensed under MIT, SwarmClaw is inspired by OpenClaw, enhancing AI orchestration capabilities for diverse applications. Keywords: #phi4, AI Agents, Agent Builder, Background Daemon, CLI Tools, Chat Connectors, Cost Tracking, Custom Providers, Dashboard, Docker Deployment, Encrypted Secrets, Gateway, LangGraph, Loop Runtime Controls, MCP Servers, Memory Search, Mobile-friendly, Model Failover, Multi-agent Workflows, Nextjs, Nodejs, OpenAI-compatible API, OpenClaw, Orchestration, Platform Tools, Plugin System, Plugins, Provider Health Metrics, Providers, React, React Keywords: SwarmClaw, Real-Time Sync, Sandboxed Execution, Scheduling, Secrets Vault, Self-hosted, Session Run Queue, SwarmClaw, Tailwind CSS, Task Management, TypeScript, Voice Settings, WebSocket, WebSocket Notifications, Zustand

model context protocol

  github.com 9 days ago
   https://swarmclaw.ai/install.sh   9 days ago
   https://github.com/swarmclawai/swarmclaw   9 days ago

2250. HN Show HN: Reflex – local code search engine and MCP server for AI coding

Reflex is a local-first, Rust-based code search engine aimed at enhancing developer productivity by integrating with AI coding tools while addressing limitations of cloud-hosted solutions. It emphasizes speed, reduced infrastructure needs, and accuracy through local indexing, which enables instant branch switching and real-time updates without relying on external servers. Key features include comprehensive searching capabilities via trigram indexing for full-text searches, Tree-sitter parsing for precise symbol extraction, dependency analysis, and incremental reindexing using blake3 hashing to focus only on modified files. Reflex offers offline availability by storing all data locally, thereby eliminating server costs and configuration complexities. It supports a wide range of programming languages including Rust, TypeScript/JavaScript, Python, Go, Java, C/C++, PHP, Ruby, Kotlin, among others. The integration with AI coding assistants is facilitated through the Model Context Protocol (MCP), allowing tools like Claude Code to contextualize codebases without needing entire file loads. Installation can be done via NPM or Cargo, and usage involves commands for indexing, full-text search, symbol-aware search, dependency analysis, and natural language querying. Reflex’s architecture relies on a trigram-based inverted index combined with runtime symbol detection using memory-mapped I/O for efficient cache access. Its performance is bolstered by efficient query handling, incremental updates, and parallel processing capabilities, all of which can be configured through `.reflex/config.toml`. Use cases for Reflex extend to code navigation, refactoring, AI-assisted snippet retrieval, debugging, security analysis, and documentation purposes. The project encourages contributions supported by comprehensive test coverage and is built using open-source tools such as tree-sitter, rkyv, memmap2, rusqlite, blake3, and ignore. Released under the MIT License, Reflex aims to provide fast, accurate, and extensible code search capabilities for developers and AI coding assistants alike. Keywords: #phi4, AI coding, AST pattern matching, AST pattern matching Keywords: Reflex, MCP server, Reflex, Rust, Tree-sitter, code search, code search engine, dependency analysis, incremental reindexing, local-first, multi-language, multi-language support, natural language, natural language query, offline, semantic queries, trigram indexing

model context protocol

github.com 9 days ago

2260. HN Show HN: InDesign MCP via UXP plugin – faster, cross-platform, no AppleScript

The "InDesign MCP via UXP plugin" is a contemporary Model Context Protocol (MCP) server that facilitates direct control of Adobe InDesign through a Universal Extensibility Platform (UXP) bridge. This updated version supersedes the older AppleScript-based implementation with one grounded in Adobe's UXP, enhancing execution speed, ensuring cross-platform compatibility across macOS and Windows, boosting reliability, and future-proofing as Adobe transitions away from ExtendScript/CEP towards UXP. Key features of this plugin include its ability to operate directly within InDesign without relying on temporary files or external scripts, thus increasing execution speed and reducing the likelihood of errors. It also supports both macOS and Windows environments via Node.js. The toolset boasts over 130 tools that encompass all major functionalities within InDesign such as document management, page handling, text and graphics editing, style application, master spreads, book creation, and export operations. The plugin employs modern JavaScript (ES2015+) with async/await, destructuring, and arrow functions to enhance scripting efficiency. The UXP plugin maintains a WebSocket connection to a Node.js bridge server, which processes invoked tools by sending JavaScript code as strings via HTTP to the bridge. This code is executed asynchronously within InDesign's UXP environment, returning structured JSON results. To set up this system, users must install the UXP Plugin through the UXP Developer Tool or InDesign’s plugin manager and start a Node.js bridge server on specified ports (3000 for HTTP, 3001 for WebSocket). Once installed, users can connect the plugin via InDesign's Plugins menu, followed by configuring the MCP Server using npm to adjust settings as needed. The architecture involves a core server component, several handler modules addressing different functionalities, and a bridge plugin that communicates through WebSocket. Comprehensive testing ensures functionality across various categories. Key API notes include requirements for collection access such as using `.item(n)`, asynchronous function calls like `doc.filePath`, and accessing Enums via specific require statements within UXP. Overall, the "InDesign MCP via UXP plugin" is designed to enhance InDesign workflows by integrating modern web technologies, improving performance and reliability while aligning with Adobe's evolving development strategies. Keywords: #phi4, AppleScript, Async IIFE, Cross-Platform, ExtendScript, InDesign, JSON, MCP Server, Nodejs, Plugin, UXP, WebSocket, Windows, macOS

model context protocol

github.com 10 days ago

2269. HN Show HN: Open-source MCP server for AI podcast clipping

"Show HN: Open-source MCP server for AI podcast clipping" presents an open-source application designed to streamline the creation of social media content from podcast transcripts, optimizing it for platforms like TikTok, Instagram Reels, or YouTube Shorts. The tool leverages text heuristics and audio energy analysis to suggest clips automatically and enhances these with various caption styles, face detection-based smart cropping, and efficient asset management systems that prevent duplicate clip generation. It integrates a knowledge base offering context about podcast hosts and style through .md files, enabling users to add relevant information and save configurations for repeated tasks. The setup requires Node.js, Python, and FFmpeg, facilitated by a command script that installs dependencies, sets up a virtual environment, and initiates either a web UI or CLI interface. The integration with Claude AI tools via Model Context Protocol (MCP) allows for automated transcription and clip creation through conversational commands. Features extend to smart clip suggestions, diverse caption styles, efficient asset management, and user-configurable settings. The project's architecture consists of TypeScript source code for the application logic, Python services handling tasks like transcription with OpenAI Whisper, and a React-based web UI. Licensed under MIT, it invites community collaboration and feedback to refine its capabilities further, fostering an environment where users can suggest improvements and contribute to its development. Keywords: #phi4, AI podcast, CLI mode, Claude integration, FFmpeg, Instagram Reels, MCP server, MIT license, Model Context Protocol, Nodejs, Open-source, Python, TikTok, Whisper transcription, YouTube Shorts, asset management, auto clip suggestion, caption styles, configuration, hardware-accelerated encoding, knowledge base, project structure, smart cropping, transcript analysis, transcript format, web UI

model context protocol

github.com 10 days ago

2283. HN Show HN: Boucle – A self-dogfooding autonomous AI agent framework in Rus

Boucle is a Rust-based framework designed for developing and running autonomous AI agents, emphasizing self-reliance through iterative development led by the AI named Boucle itself. It includes features such as structured memory (Broca), which operates without traditional databases, supporting fuzzy search and confidence scoring, and maintains inter-memory relationships via a file-based system integrated with Git. The MCP Server facilitates multi-agent collaboration by exposing these memory operations using Model Context Protocol tools. Human oversight is ensured through approval gates that mandate human confirmation for actions impacting the external world, such as financial transactions or public postings. The framework also includes an audit trail to maintain transparency and accountability, recording every decision and iteration in detailed logs stored within Git. Boucle supports Rust development with enforced linting and configuration via TOML while ensuring process integrity through locking mechanisms and scheduled execution. Initially prototyped in Bash for rapid development, it transitioned to Rust for enhanced reliability and cross-platform compatibility. Boucle is designed for extensibility through context plugins and lifecycle hooks, allowing modifications without altering the core codebase. Its principles include prioritizing files over databases, human-readable logs, and zero infrastructure dependencies, creating a secure environment with strategies like defense-in-depth against threats such as prompt injection. Contributions to Boucle are encouraged on GitHub under an MIT license, reflecting its development by Bande-a-Bonnot, which underscores the AI's role in its own creation. Keywords: #phi4, Boucle, Broca memory system, MCP server, Model Context Protocol, Rust framework, approval gates, audit trails, autonomous AI, defense-in-depth security, lifecycle hooks, persistent memory, structured memory, zero infrastructure

model context protocol

github.com 10 days ago

2284. HN Show HN: I Built Context+ AST and Embeddings for Codebase Understanding

The open-source tool Context+, developed by a programmer, aims to significantly improve the understanding of codebases through advanced techniques such as Abstract Syntax Tree (AST) parsing and semantic embeddings. Its effectiveness was demonstrated in tests on the OpenCode repository, where it achieved a 50% reduction in issue resolution time and saved up to 10,000 tokens per task by enhancing search efficiency and refactoring capabilities. Among its notable features are undo trees, semantic search, advanced refactoring, context-aware trees, and restore points, with a standout being its rapid semantic code search that minimizes token usage while reducing errors compared to traditional methods. The tool is built on a structured architecture using the Model Context Protocol (MCP) server developed in TypeScript. It consists of core components for parsing and embedding, tools for semantic navigation, and static analysis functionalities. Optimization is facilitated through environment variables designed for model embeddings and performance tuning. To ensure code quality and efficiency, Context+ follows strict operational guidelines that include fast execution with minimal token use, mandatory file headers without additional comments (except in headers), an ordered code structure, controlled abstraction levels, and disciplined variable usage. The tool supports strategic operations such as context mapping, semantic navigation, and safe refactoring by evaluating the impact of changes before implementation. It promotes efficient execution over excessive planning and encourages parallel processing of independent commands while cautioning against common anti-patterns like unnecessary full file reads or saving unvalidated code. Although still in development with potential for unexpected behavior, Context+ is presented as a future-oriented tool designed to enhance coding efficiency and accuracy by improving agentic coding practices. Keywords: #phi4, AST, Context+, GitHub, Vercel, Xcom, YouTube, anti-patterns, anti-patterns Keywords: Context+, blast radius, codebase, embeddings, fast execute mode, feature hub, propose commit, restore points, semantic identifiers, semantic search, static analysis, strict formatting rules, structural awareness, tool development, tree-sitter, undo change, vector embedding

model context protocol

contextplus.vercel.app 10 days ago

2290. HN MCP is dead. Long live the CLI

The article presents a critical evaluation of the Model Context Protocol (MCP) versus Command-Line Interfaces (CLIs), arguing that CLIs are more efficient and effective for both humans and Large Language Models (LLMs). Initially, MCP was adopted as a standardized method to integrate LLMs with various tools, but it has proven to add unnecessary complexity without delivering significant benefits. In contrast, LLMs can leverage existing CLIs due to their comprehensive training on command-line documentation and scripts. CLIs offer clear advantages such as transparency, ease of debugging, the ability to chain commands, reliable authentication methods, and minimal maintenance needs compared to MCP servers. The text highlights several practical challenges associated with MCP, including inconsistent initialization processes, frequent re-authentication requirements, and limitations in managing permissions effectively. Although there may be niche situations where MCP is beneficial due to a lack of CLI alternatives, for the majority of tasks, CLIs are preferred for their straightforwardness and reliability. The author advises companies to concentrate on developing robust APIs and corresponding CLIs instead of investing heavily in MCP servers, emphasizing the enduring benefits that CLIs provide to both human users and automated systems. Keywords: #phi4, API, Anthropic, CLI, Claude Code, JSON, LLMs, MCP, Model Context Protocol, OpenClaw, Pi, Terraform, auth flows, authentication, aws, composability, debugging, gh, grep, jq, kubectl

model context protocol

  ejholmes.github.io 10 days ago
   https://ampcode.com/manual#mcp-servers-in-skills   10 days ago
   https://claweb.ai   10 days ago
   https://github.com/awebai/aw   10 days ago
   https://github.com/sibyllinesoft/smith-core   10 days ago
   https://news.ycombinator.com/item?id=44528411   10 days ago
   https://mcporter.dev   10 days ago
   https://github.com/mavam/pi-mcporter   10 days ago
   https://github.com/containers/kubernetes-mcp-server   10 days ago
   https://github.com/r33drichards/mcp-js   10 days ago
   https://bloomberry.com/blog/we-analyzed-1400-mcp-server   10 days ago
   https://www.youtube.com/watch?v=ymMlftdGx4I   9 days ago
   https://developers.cloudflare.com/agents/api-reference&   9 days ago
   https://github.com/vercel-labs/just-bash   9 days ago
   https://news.ycombinator.com/item?id=47207790   9 days ago
   https://github.com/vercel-labs/agent-browser   9 days ago
   https://github.com/mcpshim/mcpshim   9 days ago
   https://github.com/modelcontextprotocol/servers/tr   9 days ago
   https://mcp.sentry.dev/mcp   9 days ago
   https://swamp.club   9 days ago
   https://vizzly.dev/blog/cli-json-output-llm-friendly&#x   9 days ago
   https://github.com/cduerr/stewardmcp   9 days ago
   https://blog.modelcontextprotocol.io/posts/2026-01-26-m   9 days ago
   https://benoitessiambre.com/entropy.html   9 days ago
   https://github.com/echomindr/echomindr   9 days ago
   https://github.com/birdseyevue/daisyui-mcp   9 days ago
   https://fragmentedpodcast.com/episodes/302/   9 days ago
   https://cra.mr/context-management-and-mcp   9 days ago

2295. HN Show HN: Hmem v2 – Persistent hierarchical memory for AI agents (MCP)

Hmem v2 represents an advanced hierarchical memory system designed to endow AI agents with persistent and human-like memory capabilities, addressing the challenge of session-based forgetfulness by maintaining continuity across different sessions and machines. It features a five-level hierarchical structure that mirrors human memory, from broad summaries to detailed verbatim data, allowing agents to access information progressively as needed. This system utilizes an addressable tree structure with compound IDs for nodes, facilitating precise updates without disrupting other data points. A significant innovation in Hmem v2 is its persistent memory feature across sessions and machines, achieved through a Model Context Protocol (MCP) server that ensures seamless continuity. The memory management process involves archiving obsolete entries rather than deleting them outright, making past information searchable to aid future decisions. Additionally, frequently accessed entries are promoted automatically using logarithmic age decay based on usage frequency. The system employs Fibonacci decay for session caching to avoid redundant data during bulk reads and offers two access patterns: "discover" mode prioritizes newer content, while "essentials" mode focuses on significant information. A curator role enhances memory management by auditing and optimizing the stored data, merging duplicates, addressing fragmentation, and eliminating low-value content. Hmem v2 is complemented with interactive tools such as a TUI viewer for users to explore `.hmem` files, reflecting the agent's starting session view. It supports flexible installation via npm or manual setup, catering to both system-wide and project-specific configurations. The system integrates with various AI tools like Claude Code and Gemini CLI, offering customizable memory behaviors through `hmem.config.json`, including character limits per level and bulk read settings. Overall, Hmem v2 is designed to resolve the issue of AI agents losing information between sessions by providing a structured, persistent memory framework that enhances efficiency and continuity across diverse environments. The project remains MIT-licensed with stable APIs since its 2.0 version, reflecting its readiness for production use. Keywords: #phi4, AI agents, MCP server, Model Context Protocol, TUI viewer, access-count promotion, addressable tree, compound ID, curator role, hierarchical structure, humanlike memory, persistent memory, session cache

model context protocol

github.com 10 days ago

2306. HN Show HN: MCP Playground – free MCP test servers, inspector, and 10K+ server list

MCP Playground serves as a browser-based tool designed for the seamless testing and inspection of Model Context Protocol (MCP) servers without necessitating any installations or sign-ups. Its offerings include four main features that cater to diverse developer needs. Firstly, it provides access to four free hosted MCP test servers, enabling users to evaluate connectivity, authentication mechanisms, error handling capabilities, and complex schemas. Secondly, the Server Inspector feature allows for a hands-on examination of remote MCP servers by pasting their URLs; this tool facilitates live execution of resources, viewing tools and prompts, as well as inspection of JSON-RPC logs via HTTP, SSE, or WebSocket protocols. Additionally, the Registry offers access to over 10,000 indexed servers categorized accordingly, each linked to its repository for straightforward testing within the inspector. Furthermore, MCP Playground includes a collection of Recipes + Guides comprising 45 articles and workflows aimed at practical applications such as GitHub PR reviews, standup bots, and Meta ads automation. Importantly, all features are free to use with no requirement for credit card information, making it an accessible resource for developers interested in testing MCP server tools or exploring various tutorials. Keywords: #phi4, Bearer token, Figma, GitHub PR reviewer, JSON-RPC log, MCP, Meta ads automation, Playwright, Postman-style tool, Registry, Supabase, browser, categories, connectivity, database query assistant, developers, error handling, guides, hosted servers, inspector, protocol implementations, real-time logs, recipes, schemas, server list, standup bot, test servers, tutorials

model context protocol

mcpplaygroundonline.com 10 days ago

2421. HN Show HN: VibeHQ Orchestrate multiple CLI agents as a real company team

VibeHQ is an innovative multi-agent AI collaboration platform that integrates various Command Line Interface (CLI) agents—such as Claude Code, Codex CLI, and Gemini CLI—into a unified engineering team, facilitating real-time communication through structured protocols rather than sequential synthetic interactions. The platform distinguishes itself with features like contract-driven development, which mandates the publication and approval of API specifications before coding to prevent misalignment in project assumptions. Additionally, it incorporates an idle-aware message queue that manages task interruptions by queuing messages when agents are busy and releasing them upon their availability. A key aspect of VibeHQ is its ability to maintain full CLI functionalities while overlaying collaborative tools, ensuring no disruption to native command operations. It also features state persistence, allowing tasks, artifacts, and contracts to remain intact across restarts of the system's central communication hub, which utilizes WebSocket technology for robust connectivity. Users can benefit from real-time dashboards, visual message routing, structured document publishing, and idle detection. Although primarily developed for Windows environments, there are plans to extend support to Mac/Linux systems. The platform’s capabilities were illustrated through a demonstration where seven agents collaboratively built a full-stack hospital management system upon the direction of a single project manager. This demo highlighted the efficient management of real-time agent conversations, task assignments, contract negotiations, status updates, and artifact dissemination within VibeHQ. Currently in development, the project invites contributions to expand its toolkit or improve CLI support, promising an evolving landscape for AI-powered collaborative engineering solutions. Keywords: #phi4, CLI agents, MCP tools, VibeHQ, WebSocket hub, agent isolation, architecture feedback, collaboration platform, contract-driven development, idle-aware message queue, multi-agent, real-time dashboard, state persistence, task management

model context protocol

github.com 11 days ago

2451. HN New habits for tech writers in the age of LLMs

In the era of Large Language Models (LLMs), tech writers must evolve by acquiring skills centered on automation, coding, and strategic content creation to remain relevant. They can utilize LLMs for automating tasks such as generating documentation or managing Continuous Integration (CI) pipelines, enabling them to concentrate on more impactful work. To contribute effectively to their organizations' tooling and minimize reliance on developer backlogs, tech writers should learn basic development practices like scripting in Python or PowerShell, even without being full-fledged developers. Understanding LLMs is crucial for tech writers, necessitating both theoretical knowledge and practical experience with various models and tools. They are encouraged to develop "skills" or "agentic docs" that enhance the usability of documentation for humans and AI alike. Integrating Model Context Protocol (MCP) and using subagents can significantly improve LLM-based workflows by facilitating more efficient interactions with APIs and servers, while sandboxed environments at home provide a safe space for experimentation. As automation takes over repetitive tasks, tech writers should focus on enhancing information architecture, content strategy, taxonomy, templates, and context curation to ensure that high-quality input is fed into models. This shift requires viewing LLMs not as search tools but as colleagues requiring well-structured briefings. Tech writers are encouraged to embrace their beginner status and share both successes and challenges with these new technologies, thereby fostering community learning and innovation in the AI era. Keywords: #phi4, AI-powered IDEs, CI pipelines, GitHub workflows, LLMs, Model Context Protocol, PowerShell, Python, Tech writers, automation, context curation, devling, information architecture

model context protocol

passo.uno 11 days ago

2459. HN Show HN: Fava Trails – Git-backed memory for AI agents using Jujutsu (JJ)

FAVA Trails is a sophisticated Git-backed memory system crafted to solve the problem of "memory poisoning" in autonomous AI agents by leveraging Jujutsu (JJ), a version control system. This approach ensures consistent and reliable agent memories through atomic state snapshots, coupled with full causal tracking for any necessary corrections. The system employs draft isolation, allowing initial agent inputs to be stored locally without impacting shared memory until they pass the Trust Gate process, which requires validation by an LLM or human approval. Functioning as a Model Context Protocol (MCP) server, FAVA Trails facilitates seamless interactions among agents while abstracting direct version control command usage. Its crash-proof operation is assured through JJ's automatic snapshot feature, eliminating data loss during unforeseen crashes. The design distinguishes between the stateless engine and fuel, where the former refers to the MCP server, and the latter consists of agent data stored in user-controlled repositories. FAVA Trails also supports synchronization across machines via git remotes and offers comprehensive setup and configuration guidance through its detailed documentation and contribution guide. Released as open-source software under the Apache 2.0 license, it allows users to easily install FAVA Trails using pip or directly from its source repository. Keywords: #phi4, AI agents, API key, Apache 20, Fava Trails, Git-backed, GitHub, Jujutsu (JJ), LLM-based reviewer, Model Context Protocol (MCP), OpenRouter, PyPI, YAML frontmatter, agent conventions, agent memory, atomic state snapshots, autonomous systems, causal graph, configuration variables, conflict resolution, contributing guidelinesKeywords: Fava Trails, crash-proof, cross-machine sync, data repo setup, development environment, draft isolation, git remotes, hallucinations containment, manual testing, markdown files, memory poisoning, push strategy, scope discovery, semantic tools, supersession chains, thought lifecycle, trust gate, version control

model context protocol

github.com 11 days ago

2545. HN Why LLMs can't play chess

The article examines the challenges large language models (LLMs) face when playing chess, despite their success in other areas such as pattern recognition and language processing. It illustrates these limitations using Gotham Chess's YouTube series, highlighting LLMs' difficulties in maintaining game state and adhering to rules throughout a chess match—from openings to endgames—resulting in illegal moves and strategic mistakes. The primary issue lies in the LLMs' inability to understand or represent the dynamic state of a chessboard accurately; they rely on pattern recognition from vast training datasets for handling standard openings but struggle during midgame phases due to unique board positions, leading to "state tracking failure." In contrast, traditional chess engines like Stockfish utilize explicit rule-based representations and advanced algorithms, including neural networks, to evaluate game states effectively. While LLMs can improve through fine-tuning with supervised learning—potentially reaching grandmaster-level play by training on extensive chess databases—they still depend on external tools such as Stockfish for their data, highlighting their limitations in mastering the game independently. This discussion reveals a broader insight: although LLMs excel at statistical approximation and language tasks, they often underperform in areas requiring strict rule adherence and precise state management. The article suggests that when addressing problems like chess, which demand exactness and logical precision, it is beneficial to integrate LLMs with specialized systems to compensate for their shortcomings. Keywords: #phi4, Copilot, Elo rating, Gotham Chess, Grok, LLM architecture, LLMs, Model Context Protocol, Monte Carlo Tree Search, Stockfish, chess, emergent world model, endgame, illegal moves, midgame, neural network, openings, supervised learning, symbolic world model, vector parameters

model context protocol

www.nicowesterdale.com 11 days ago

2600. HN Show HN: Taskdog – Terminal-based task manager with schedule optimization

Taskdog is a terminal-based task manager tailored for individual users, integrating Taskwarrior's features with automatic schedule optimization capabilities. It operates locally using SQLite and provides various interfaces including CLI, TUI (Textual), and a REST API server to cater to diverse usage needs. Key functionalities include time tracking, Gantt chart visualization, Markdown notes, batch operations, and soft delete functionality. Structured as a monorepo, Taskdog consists of five packages: `taskdog-core` for core logic; `taskdog-client`, an HTTP API client library; `taskdog-server`, offering FastAPI REST API with OpenAPI documentation; `taskdog-ui`, providing CLI/TUI interfaces; and `taskdog-mcp` for integration with Claude Desktop via the Model Context Protocol. It features nine scheduling algorithms that optimize task schedules, accommodating fixed tasks and dependencies with circular detection capabilities. The TUI provides an interactive full-screen interface supported by keyboard shortcuts, while audit logging tracks all operations. Taskdog requires Python 3.12+ and `uv` for installation on Linux and macOS, with Windows support forthcoming. Installation is straightforward via Git or Docker, and the open-source project under the MIT License encourages contributions through guidelines in a CONTRIBUTING.md file. Keywords: #phi4, CLI, Docker, FastAPI, Gantt Chart, Linux, MIT License, Model Context Protocol, Python, REST API, SQLite, TUI, Taskdog, Windows, audit logging, coverage reports, dependencies, linting, macOS, scheduling algorithms, tests, time tracking, type checking

model context protocol

github.com 11 days ago

2638. HN Use plain old REST instead of MCP

The text discusses the integration of AI agents with digital environments, initially facilitated by the Model Context Protocol (MCP), while noting that RESTful APIs on HTTP are well-established standards despite their early challenges, such as low-level access complexity and security concerns related to API key exposure. Recent advancements in AI models have alleviated some issues, allowing for effective use of lower-level primitives and updated documentation searches. However, authentication remains a primary challenge due to the overhead it imposes. The tool Latchkey addresses this by enabling agents to make HTTP requests using familiar curl commands while seamlessly integrating necessary credentials. Users can establish these credentials manually or via an interactive login process, with the tool focusing on simplicity and minimizing complexity in API interactions. As a free and open-source project licensed under MIT, Latchkey encourages feedback and contributions, providing an accessible solution for AI agents to efficiently utilize HTTP APIs. Keywords: #phi4, AI agents, HTTP APIs, LLM, Latchkey, MCP, REST, RESTful APIs, authentication, credentials, curl, digital environment, open-source, simplicity, transparency

model context protocol

imbue.com 12 days ago

2666. HN MCP Horror Stories: The GitHub Prompt Injection Data Heist [2025]

Part 3 of the "MCP Horror Stories" series uncovers a critical vulnerability within GitHub's Model Context Protocol (MCP) integration, identified by Invariant Labs as "The GitHub Prompt Injection Data Heist." This security flaw allows malicious actors to create GitHub issues that command AI agents using broad personal access tokens, potentially leading to significant data breaches. These tokens can provide AI assistants with unauthorized access to both public and private repositories, enabling them to extract sensitive information under the guise of legitimate commands. This vulnerability is particularly concerning due to its integration with popular AI platforms, affecting a vast number of developers and enterprises reliant on GitHub for code management. The attack mechanism involves injecting malicious prompts into issues within a public repository, which when processed by an AI agent, can exploit the token's extensive permissions to access restricted data from private repositories. Docker MCP Gateway offers a comprehensive defense strategy against these prompt injection attacks through its programmable interceptors. These interceptors serve as dynamic security filters that can scrutinize, modify, or block tool calls between AI clients and MCP servers, thereby preventing unauthorized cross-repository data access. By implementing a "one repository per session" policy, Docker effectively thwarts attempts at malicious privilege escalation. Furthermore, Docker enhances security by transitioning from broad Personal Access Tokens to scoped OAuth tokens, which provide improved protection features such as restricted access scope, encrypted storage, and the ability for immediate revocation. The Gateway also ensures robust container isolation, creating a multi-layered defense-in-depth strategy that guards against diverse attack methodologies. The series emphasizes the necessity of deploying intelligent, real-time defenses to safeguard AI integrations from prompt injection attacks, transforming MCP into a secure platform suitable for enterprise-level AI development. This approach not only mitigates risks but also reinforces the security infrastructure essential for modern software environments. Keywords: #phi4, AI Agents, API Calls, Attack Vector, Authentication, Container Isolation, Credential Vulnerabilities, Cross-Repository, Data Exfiltration, Data Heist, Docker, Docker GatewayKeywords: MCP, Enterprise Protection, GitHub, GitHub Integration, Interceptors, MCP, Malicious Issue, OAuth, Personal Access Token, Privilege Escalation, Prompt Injection, Protocol Level, Real-Time Defense, Repository Access, Security, Security Audit, Tool Calls

model context protocol

www.docker.com 12 days ago

2668. HN Show HN: Open-source agent with a brain instead of MEMORY.md

Nero emerges as a cutting-edge open-source AI agent offering an advanced personal assistant experience through its distinctive features. It excels in autonomy by managing projects and tasks independently when the user is absent, adapting priorities as needed. This capability is complemented by its ability to maintain consistent contextual memory across various interaction interfaces like voice calls, texts, and web dashboards, ensuring seamless communication continuity. Nero's sophisticated memory management utilizes a node/edge graph structure instead of traditional text files, allowing for a nuanced understanding of the tools, projects, and topics it engages with. Furthermore, it incorporates emotion detection in real-time during voice interactions via Hume's Expression Measurement API to tailor responses based on user emotions. The agent supports Model Context Protocol (MCP) servers quickly to enhance context management and offers dynamic interface creation, enabling interactive displays such as Spotify controllers or system dashboards across network-connected devices. Nero integrates effortlessly with platforms like voice calls, SMS, and Slack through webhooks facilitated by Pompeii's infrastructure. Deployment is simplified through availability as a Docker image, supporting both integrated and standalone setups for varied environments. Its cross-platform accessibility ensures functionality on multiple devices, including iOS apps, featuring chat, voice mode, and knowledge graph exploration capabilities. Built to act more than just reactively, Nero manages tasks actively while maintaining secure interactions across platforms. It leverages mDNS for local network discovery and generates TLS certificates for security, all under an MIT license that encourages community contributions and enhancements. Keywords: #phi4, AI agent, Docker deployment, Model Context Protocol (MCP), Nero, SMS, Slack integration, autonomy mode, browser automation, emotion detection, iOS app, knowledge graph, node/edge graph, voice calls

model context protocol

github.com 12 days ago

2683. HN Show HN: Mcpman – The package manager for MCP servers

McPman serves as a versatile package manager specifically tailored for handling Model Context Protocol (MCP) servers across multiple AI client platforms including Claude Desktop, Cursor, VS Code, and Windsurf. It provides users with a comprehensive command-line interface that simplifies the installation, management, and inspection of these servers. Among its standout features are universal support for various clients, registry awareness which accommodates npm, Smithery, or GitHub URLs, and reproducibility through lockfiles. Additionally, McPman offers health checks to ensure server diagnostics and interactive prompts during installations. Users can leverage a range of commands with McPman such as installing and removing MCP servers, listing currently installed servers, running health diagnostics, and setting up project-specific management via an `mcpman.lock` file. Compared to similar tools like Smithery CLI and mcpm.sh, McPman distinguishes itself through broader client compatibility, lockfile-based reproducibility, extensive health checks, and the capability to manage multiple registry sources. The McPman project encourages community contributions by allowing users to fork its repository, create feature branches, install dependencies using npm, execute tests, and submit pull requests. The project operates under the MIT license, supporting open-source collaboration and innovation in AI server management. Keywords: #phi4, AI clients, CLI, ES modules, MCP servers, MIT License, Node, TypeScript, commands, health checks, inspect, install, interactive prompts, manage, mcpman, package manager

model context protocol

github.com 12 days ago

2685. HN French Government Data MCP Server

On February 25, 2026, the French government launched an experimental Model Context Protocol (MCP) server for "datagouv," enhancing AI-driven interactions with public data through chatbots. Developed by Anthropic in late 2024, MCP enables AI models to interface more effectively with external software and data sources. The datagouv MCP server facilitates exploration of open data via three APIs that allow users to search datasets, access metadata, list resources, query data directly, download and parse resources, and retrieve usage metrics. Currently functioning in a read-only mode, the server is designed for exploring public data without enabling modifications. Future developments may include testing controlled editing and publishing features using sovereign models. Despite its potential to enrich AI contextuality, the MCP framework presents challenges in auditing accuracy and reliability, prompting caution against unofficial servers purporting to be affiliated with datagouv. The government seeks user feedback on this experimental setup, available through a public GitHub repository, to aid in refining its development. Keywords: #phi4, AI, API, Anthropic, French Government Data, GitHub, MCP Server, Model Context Protocol, audit, caution, chatbot, data access, datagouv, experimental, get_dataset_info, metadata, non-official servers, public data, query_resource_data, resources, search_datasets, tools

model context protocol

www.data.gouv.fr 12 days ago

2705. HN Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills

Snyk Agent Scan Summary provides a detailed overview of a security scanning tool designed to identify vulnerabilities within AI agents, Model Context Protocol (MCP) servers, and associated skills. It focuses on detecting critical threats such as prompt injections, malware payloads, and sensitive data mishandling by analyzing components like harnesses, MCP servers, and various agent skills. The tool features automatic discovery and scanning of configurations for tools such as Claude, Cursor, Windsurf, and Gemini CLI. Key vulnerabilities it targets include prompt injection attacks, tool poisoning, cross-origin escalation, toxic flows, and rug pull attacks. The scan can be executed in two modes: Scan Mode, which involves command-line interface (CLI) based scans generating comprehensive reports; and Background Mode, providing continuous monitoring with reporting for enterprise environments. Users need to install `uv` before using the Snyk Agent Scan, which can be activated with specific commands like `uvx snyk-agent-scan@latest --skills` for a full scan or tailored commands to focus on configurations or skills. The tool also supports detailed logging, JSON outputs, and the option to suppress server outputs. Currently, the development of Snyk Agent Scan is closed to external contributions, but users can report issues or suggestions through GitHub. Developers have the capability to run scans from source using specified commands. Integration with personal projects or registries is facilitated via designated APIs, though misuse may result in account blocking. For those interested in further insights, a technical report on emerging threats within the agent skill ecosystem is available, alongside detailed changelog information documented in `CHANGELOG.md`. Keywords: #phi4, AI agents, API, CLI command, GitHub issues, JSON format, MCP servers, Snyk Agent Scan, Snyk Evo, agent skills, auto-discover, background mode, development setup, inventory, prompt injections, scan mode, scanning, security scanner, technical report, threats, toxic flows, uvx, vulnerabilities

model context protocol

github.com 12 days ago

2735. HN MCP Is Great for Tools. Terrible for Agents

The article evaluates two distinct plugin models used in development environments: the Model Context Protocol (MCP) by Claude Code and direct plugins employed by OpenCode. MCP leverages JSON-RPC 2.0 over standard input/output to facilitate communication between a main process and its child processes via a pipe, offering simplicity, language neutrality, and process isolation. However, it lacks lifecycle hooks for managing plugin behavior at various stages and struggles with shared state management due to its isolated nature. Issues such as console log interference with JSON-RPC streams necessitate the use of `console.error` for debugging, while session management relies on an error-prone method involving `.jsonl` files extraction from a specific directory. In contrast, OpenCode’s plugin model integrates plugins within the agent runtime, providing more interaction points and enabling deterministic command dispatch through mechanisms like hash map lookups. This approach supports lifecycle hooks, allowing developers to modify behavior at different stages, and facilitates shared state management by running plugins in a common runtime environment. Consequently, OpenCode's model is well-suited for intricate workflows that require collaboration between multiple agents and extensive plugin orchestration. The comparative analysis underscores MCP’s strengths in deploying isolated tools across various editors due to its portability but highlights its limitations in complex agent orchestration resulting from limited interaction capabilities. Conversely, OpenCode excels at supporting sophisticated systems with integrated plugins capable of shared states and deeper runtime interactions. The article concludes that while MCP is ideal for straightforward tool implementations needing high portability, OpenCode’s model is better suited for complex systems requiring robust plugin interactivity and workflow integration. It suggests the need for a hybrid approach that combines both models to leverage their respective strengths, acknowledging current limitations in fully supporting such integration within existing platforms. Keywords: #phi4, Claude Code, JSON-RPC, MCP, OpenCode, agents, architecture, dispatch, integration, isolation, lifecycle hooks, platform, plugins, process isolation, state sharing, tools

model context protocol

blog.vtemian.com 12 days ago

2776. HN Mobile-MCP: Letting LLMs autonomously discover Android app capabilities

Mobile-MCP represents an innovative strategy aimed at enhancing the functionality of mobile AI assistants on Android platforms by overcoming limitations found in existing systems, such as those relying on predefined schemas and APIs (like Apple Intelligence) or GUI-based automation methods (e.g., AppAgent). This approach utilizes the Model Context Protocol (MCP) through the Android Intent framework, enabling applications to declare their capabilities via natural-language descriptions within manifest files. A significant advancement of Mobile-MCP is its ability for a language model-based assistant to autonomously identify app capabilities using the PackageManager. This capability allows the AI to select suitable APIs and formulate parameters from natural language inputs, executing actions through standard Android service bindings or Intents. Unlike traditional methods that depend on pre-established action domains or centralized schemas per assistant—necessitating custom integrations for each tool—Mobile-MCP removes these requirements. Instead, it supports dynamic addition and independent evolution of tools without prior knowledge of specific apps. A functioning prototype of Mobile-MCP has been developed and is available along with its specification, demo, and detailed documentation on GitHub. The project seeks feedback from stakeholders involved in mobile agents, security, MCP tooling, or Android system design to assess if OS-native capability broadcasting combined with LLM reasoning can offer a more scalable alternative to conventional fixed schemas or GUI automation methods. Keywords: #phi4, AI assistants, APIs, Android app, GUI automation, GUI-based agents, Intent framework, Intents, LLMs, Mobile-MCP, Model Context Protocol, OS-native, PackageManager, assistant schemas, capability broadcasting, dynamic tools, natural-language descriptions, runtime discovery, scalability, schemas, security, service binding, system design

model context protocol

news.ycombinator.com 12 days ago

2777. HN LastSaaS: Free, open-source SaaS boilerplate; Go+React, built with Claude Code

LastSaaS is an open-source SaaS boilerplate designed to facilitate the creation of multi-tenant applications using Go and React. It offers a robust foundation with features like authentication, role-based access control, Stripe billing, API key management, and a full admin interface. A unique aspect of LastSaaS is its Model Context Protocol (MCP) server that supports AI-assisted operations such as querying business metrics in natural language. This sets it apart from other SaaS boilerplates which often have high licensing fees or are limited to JavaScript stacks. The Go-based backend is chosen for its efficiency and concurrency, while full multi-tenancy support ensures scalability. The project highlights ease of use through customizable codebase options that can be forked for further AI-driven development. It's production-ready with Docker support and Fly.io deployment configurations, making it suitable for rapid development. Key functionalities outlined include API key usage audits, subscription plan reviews, webhook delivery checks, user lookup and membership details, health metrics analysis, and specific node data monitoring over time. Deployment instructions are provided, emphasizing necessary configuration steps such as secrets management. The extensibility of LastSaaS is enhanced by its AI integration, allowing for straightforward feature addition following consistent coding patterns. Privacy assurance is given with users having full control over their data, which is not shared with Metavert LLC or any third parties. Licensed under MIT, the project assures open access and flexibility for developers in the agentic era of software creation. Keywords: #phi4, AI Agent, API keys, Credit System, Dockerfile, Flyio, Go+React, Health Metrics, MIT License, MongoDB, Privacy Policy, SaaS, Stripe billing, Subscription Plans, User Lookup, Webhook Delivery, admin interface, authentication, multi-tenant, role-based access control, system health monitoring, webhooks

model context protocol

github.com 12 days ago
https://meditations.metavert.io/p/the-last-saas-boilerp 12 days ago

2780. HN Complete Agentic AI Operating System

The text presents a suite of developer tools and WebAssembly (WASM) packages aimed at enhancing vector operations, observability, metadata filtering, collection management, AI, and graph algorithms with emphasis on browser and edge deployments. The core components are: 1. **Developer Tools**: - **`ruvector-bench`**: A benchmarking suite for testing vector operations. - **`ruvector-metrics`**: Offers monitoring features to enhance observability. - **`ruvector-filter`**: Provides metadata filtering and query predicates. - **`ruvector-collections`**: Manages multi-tenant collections. - **`ruvector-snapshot`**: Handles point-in-time snapshots and backups. - **`micro-hnsw-wasm`**: A lightweight HNSW graph implementation for WASM, designed for constrained devices. 2. **Target-Specific Implementations**: - **`ruvector-esp32`**: Facilitates vector search on ESP32/ESP-IDF platforms in no_std environments. - **`rvlite`**: A lightweight edge database similar to SQLite, supporting ARM, RISC-V, and WASM targets. 3. **WASM Packages**: - Specialized for AI, graph algorithms, and distributed computing in JavaScript/TypeScript environments. - Includes vector search (`ruvector-wasm`), neural models (`@ruvector/gnn-wasm`, `@ruvector/attention-wasm`), and exotic AI mechanisms. 4. **Package Categories**: - Core functionalities for vector operations (~200KB). - AI & Neural modules (~300KB) covering graph-based learning. - Graph algorithms (~250KB) with structures like mincut. - Exotic AI features (~350KB) introducing unconventional systems. - LLM inference packages (~500KB) supporting large language models. 5. **Installation Instructions**: - Individual or bulk WASM packages can be installed via npm, and building from source is supported using `wasm-pack`. 6. **Key Features and Examples**: - **MicroLoRA**: Provides ultra-fast Low-Rank Adaptation with minimal latency for real-time AI learning. - **ruvector-economy-wasm**: Implements a CRDT-based credit economy for distributed networks, featuring stake/slash mechanisms and reputation scoring. - **ruvector-exotic-wasm**: Introduces emergent behaviors through decentralized governance models like Neural Autonomous Organizations (NAO) and morphogenetic networks. These tools are designed to support advanced capabilities in AI and vector operations across various environments, particularly constrained devices or web-based platforms. Keywords: #phi4, Agentic AI, Benchmarks, Docker Hub, GNN Layer, Graph Intelligence, HNSW Index, Neo4j, PostgreSQL, Recommendations, RuVector, Rust Crates, SIMD Acceleration, Self-Learning, Vector Database, WASM Implementation

model context protocol

github.com 12 days ago

2812. HN Building a RAG Tool in Ruby 4

Clarion, a Retrieval-Augmented Generation (RAG) tool developed by Planet Argon using Ruby, aims to enhance internal workflows by leveraging historical context from systems like Jira, Confluence, and GitHub through embeddings and vector databases. The project's motivation is to streamline knowledge retrieval for new tasks, reducing repetitive work when dealing with forgotten details or past issues. Opting for Ruby due to team familiarity, the tool minimizes dependencies by using specific gems such as `ruby-openai` for embedding generation and language model completions, while employing Pinecone and Chroma for vector databases. Implemented as a command-line interface (CLI) application without an HTTP server backend, Clarion maintains lightweight operation and ease of maintenance. It processes client data into structured documents, supporting parallel ingestion with concurrency management. The tool employs embeddings for efficient historical context querying, utilizing relationship boosts and temporal decay scoring tweaks to enhance result relevance. A significant enhancement involves integrating Clarion as a Model Context Protocol (MCP) server, enabling engineers to perform analyses directly within their editors using Claude Code. This integration facilitates inline clarifying questions and acceptance criteria suggestions based on project history, thereby streamlining project management workflows by maintaining contextual analysis continuity. Clarion enforces strict multi-tenant scoping through namespaces due to shared infrastructure across clients. Plans for broader team adoption and potential open-sourcing are underway once client references are anonymized. The tool serves as a bridge between internal systems, improving contextual understanding before development begins, while further exploration into AI-driven code generation capabilities remains on the horizon. The article emphasizes ensuring clarity in project work using Clarion, especially when shared Atlassian accounts are used across client projects. Data isolation is enforced through explicit namespaces within the code rather than infrastructure boundaries, with validation checks at multiple levels to prevent errors from processing incorrect client data. Although engineers have access to all clients' data in Atlassian, the tool maintains strict scoping per run, necessitating intentional context selection for analysis. The team explores AI-assisted code generation but focuses on the collaborative layer, ensuring project requirements are well-understood before coding starts. Currently using gpt-4o-mini and in its experimental phase, Clarion integrates external systems like GitHub to complement Atlassian's internal focus, addressing context gaps. The team considers eventual open-sourcing after anonymizing client references and encourages starting similar projects with Ruby due to its approachability. Keywords: #phi4, AI Features, AI Pilot, API Credentials, Acceptance Criteria, Anthropic, Atlassian, CLI Tool, Chroma-DB, Clarifying Questions, Clarion, Claude Code, Client Configuration, Client Scope, Communication Style, Concurrent-Ruby, Confluence, Contextual Analysis, Data Isolation, Edge Cases, Embedding Generation, Embeddings, GitHub, Implementation Notes, JSON Output, Jira, LLM Integration, LLM-Assisted Code Generation, MCP Server, Model Context Protocol, Multi-Tenant Scoping, Namespace, Open Source, OpenAI, Parallel Ingestion, Pinecone, Prefix Check, Prompting, RAG Tool, Relationship Boost, Retrieval and Re-ranking, Ruby, Temporal Decay, Text Analysis, Thor, Ticket ID Validation, Vector Database, Vector Store Abstraction

model context protocol

robbyonrails.com 12 days ago

2826. HN Sharesight MCP

The Sharesight MCP Server is a Model Context Protocol tool developed to bridge AI assistants like Claude with the Sharesight portfolio tracking platform via its v3 API, enabling natural language interactions for investment management tasks such as handling portfolios, holdings, custom investments, and performance reports. Users begin by obtaining OAuth credentials from Sharesight through their support or API documentation. Authentication is conducted via a one-time command `npx github:Haizzz/sharesight-mcp auth`, which prompts the user to input credentials and stores authorization tokens locally. Following authentication, users must update the Claude Desktop configuration file with specific server details. The MCP Server boasts an extensive feature set, supporting 27 tools linked to Sharesight API endpoints for tasks such as listing, viewing, and updating portfolios; managing holdings; creating custom investments; generating performance reports; and handling additional functionalities like coupon rates and token management. For installation and development, users need to clone the repository, install dependencies using `npm`, and run authentication with `node dist/index.js auth` if opting for source installation. Tokens are stored in user-specific directories and refresh automatically but can be manually refreshed by re-authorizing. The tool also includes error handling mechanisms for common issues such as unauthorized access (401), insufficient permissions (403), resource unavailability (404), and validation errors (422). The Sharesight MCP Server is open-source under the MIT license, with support available from the project maintainer. Keywords: #phi4, AI Assistants, Authentication, Configuration, Custom Investments, Development, Error Handling, Holdings Management, Investment Portfolios, License, MCP Server, Model Context Protocol, OAuth Credentials, Performance Reports, Portfolio Tracking, Sharesight, Support, Token Storage, v3 API

model context protocol

github.com 12 days ago

2827. HN HeadElf – C-Suite Executive Intelligence System

HeadElf is an innovative business intelligence platform designed specifically for C-suite executives to enhance decision-making across various crucial functions such as technology, finance, security, and operations. It leverages a Git-based architecture that integrates with GitHub's enterprise-grade systems to offer secure, audit-tracked executive decisions without the need for additional infrastructure. HeadElf provides autonomous capabilities tailored to roles like CTO, CFO, CISO, and COO, delivering specialized intelligence in areas such as mergers and acquisitions (M&A), innovation strategy, financial modeling, and global compliance. The platform seamlessly integrates with a variety of enterprise systems through Claude Code’s Model Context Protocol (MCP), eliminating the need for custom development. HeadElf's architecture supports a two-dimensional extensibility framework, enabling customization and scalability across different industry verticals—like Financial Services and Healthcare—and organizational contexts ranging from startups to multinational corporations. Central to its design is an advanced AI-powered executive intelligence core paired with a global operations platform, featuring enterprise integration capabilities such as real-time data analytics. HeadElf has been fully implemented for production use and adheres to comprehensive legal compliance frameworks. It also supports community contributions through its open-source development model while maintaining high-quality governance and security standards. The tool is poised to transform executive decision-making by providing strategic insights, operational excellence, and facilitating global expansion capabilities that are ready for immediate deployment. Keywords: #phi4, AI-Powered Decision-Making, Audit Trail, Autonomous Execution, Business Intelligence, C-Suite, Community Contribution, Compliance, Crisis Management, Decision Support, Digital Transformation, Enterprise Security, Executive Intelligence, Extension Framework, Financial Modeling, Git-Based Architecture, GitHub Integration, Global Operations, HeadElf, Industry Verticals, Legal Disclaimer, M&A Evaluation, Model Context Protocol (MCP), Open-source Development, Product Roadmap, Real-time Data, Regulatory Compliance, Talent Strategy

model context protocol

pauljbernard.github.io 12 days ago

2829. HN Show HN: GoldRush CLI – one command for blockchain data

The GoldRush CLI, developed by Covalent/GoldRush, is a command-line tool designed to simplify access to blockchain data for developers and AI agents. It eliminates setup friction with 17 commands supporting over 100 chains, offering real-time streaming and native Model Context Protocol (MCP) support. Its key features include affordability and accessibility through a $10/month Vibe Coding Plan and x402 micropayments for API-free access. The tool provides dual interfaces: rich terminal outputs for human users and structured data streams for AI agents via MCP. The functionality of GoldRush CLI encompasses portfolio management, market discovery, trading intelligence, and utility commands such as API key management and configuration checks. It allows AI agents to interact with blockchain data as an MCP server, facilitating tasks like market monitoring and portfolio analysis through continuous loops. Developers can utilize the CLI's data feeds to build applications such as agentic risk monitors, wallet risk scoring systems, DeFi portfolio optimization tools, and onchain identity frameworks. Future enhancements for the GoldRush CLI include expanding MCP tools, increasing streaming coverage across chains and decentralized exchanges (DEXes), developing agent-native workflows, integrating direct payments via x402, and fostering community-driven command extensions. As part of a broader strategy, the GoldRush CLI aims to bridge the gap between developers, AI agents, and blockchain data, encouraging innovation within decentralized ecosystems. Keywords: #phi4, AI agents, API developers, API traffic, CLI commands, Covalent, GoldRush CLI, LLMs, MCP server, MCP support, Model Context Protocol, OHLCV charts, SDK, Vibe Coding Plan, action chains, agent builders, agent integration, agentic workflows, barrier removal, blockchain data, chains, commands, community contributions, continuous loops, feedback, gas price estimates, interactive tables, interface adaptation, market discovery, micropayments, onboarding, portfolio management, protocols, real-time streaming, streaming coverage, structured data, terminal-first tool, trading intelligence, vibecoding, wallet activity, workflow, x402

model context protocol

goldrush.dev 12 days ago

2887. HN Apple Releases Xcode 26.3 with Support for AI Agents from Anthropic and OpenAI

Apple has unveiled Xcode 26.3, featuring AI agents from Anthropic and OpenAI to streamline app development within its platform. This release allows developers to utilize tools like Claude Agent and Codex directly in Xcode, facilitating more autonomous execution of intricate tasks. By collaborating with these companies, Apple has integrated the agents to provide comprehensive access to various Xcode functionalities such as file creation, project structure analysis, direct building and testing, image snapshots, and up-to-date developer documentation. The new version supports tools that adhere to the open standard Model Context Protocol, enhancing compatibility and flexibility for developers. Xcode 26.3 is now available on Apple's developer website, marking a significant advancement in integrating AI capabilities into app development environments. Keywords: #phi4, AI Agents, Anthropic, Apple, Claude Agent, Codex, Model Context Protocol, OpenAI, Xcode, agentic coding, app development, compatibility, developer website, documentation, download, features, files, project structure, snapshots, tests, tools

model context protocol

www.macrumors.com 13 days ago

2914. HN Show HN: 20x – Open-source agent orchestrator for Linear/HubSpot tasks

20x is an innovative open-source desktop application developed by Peakflo's engineering team to enhance efficiency in B2B fintech environments. Specifically designed for macOS, with plans for Linux and Windows compatibility, it leverages AI coding agents to automate task management within systems like Linear, HubSpot, and GitHub Issues. The primary aim of 20x is to eliminate the redundancy associated with manual tasks by facilitating code generation and integration directly into existing workflows. The application integrates seamlessly with various task management tools, utilizing multiple AI coding agents such as Claude Code, OpenCode, and Codex to carry out designated tasks effectively. A standout feature of 20x is its self-improving skills system that refines reusable instruction templates based on performance feedback, thereby enhancing institutional knowledge over time. Additionally, it supports Git worktree creation for isolated task branches and automates pull request generation, further streamlining the development process. Emphasizing a local-first architecture, 20x employs SQLite to maintain data locally without requiring cloud synchronization or subscription services. This approach is complemented by robust security measures, such as encrypting API keys with Electron's safeStorage and maintaining strict isolation in its process architecture. As an open-source project under the MIT license, 20x encourages community contributions. Distinguishing itself from traditional hosted solutions, 20x is agent-agnostic and prioritizes local-first productivity enhancements through seamless task automation and integration. Future developments for the application include expanding support for additional integrations and introducing collaboration features, further solidifying its position as a versatile tool in automating workflow tasks. Keywords: #phi4, 20x, AI agents, Anthropic Claude, Codex, Electron, Git worktrees, HubSpot, Linear, Linux, MIT licensed, OAuth, OpenCode, Peakflo, React, SQLite, Skills System, Tailwind CSS, Windows, Zustand, agent orchestrator, integrations, local-first, macOS, task systems

model context protocol

github.com 13 days ago

2937. HN Show HN: Gonzales – Self-hosted internet speed monitor with Home Assistant

"Gonzales" is a self-hosted tool designed for continuous internet speed monitoring that integrates seamlessly with Home Assistant, providing users with transparent and comprehensive insights into their internet connection's performance. The tool leverages Ookla servers to conduct automated speed tests around the clock, storing all data locally to ensure user privacy. Key features include real-time dashboards displaying historical trends, server comparisons, SLA compliance tracking, and predictive analytics, all aimed at giving users a detailed view of their network's quality over time. Integration with Home Assistant is streamlined through a one-click add-on installation, offering 10 sensors that enable smart home automation based on internet performance. Additionally, "Gonzales" supports local data storage without requiring external dependencies or subscriptions, and offers developer-friendly tools such as a REST API, SSE streaming, and CLI support for enhanced customization. The core functionalities of "Gonzales" encompass adaptive scheduling, anomaly detection, network diagnostics, ISP grading, and quality of service profiles. It also includes features like outage detection and performance alerts to keep users informed about their connection status. Installation is versatile, supporting both standalone setups and integration with Home Assistant, with configurable settings managed through a .env file. Security measures are robust, featuring API key protection for network exposure and rate limiting to prevent abuse. While "Gonzales" itself is MIT licensed, it requires the proprietary Ookla Speedtest CLI software, subject to its separate EULA. Overall, "Gonzales" offers an effective solution for users seeking detailed internet performance monitoring with privacy and integration benefits in a smart home environment. Keywords: #phi4, AI Integration, Analytics, CLI Commands, ConfigurationKeywords: Gonzales, Dashboard, Developer-Friendly, Documentation, Gonzales, Home Assistant, Internet Speed Monitor, Local Data, Ookla Speedtest CLI, Python Backend, REST API, Rate Limiting, React Frontend, SQLite Database, Security, Self-hosted, Smart Scheduling, Transparency

model context protocol

github.com 13 days ago

2950. HN Claude Code: The Revolution Nobody Noticed

On February 24, 2025, Anthropic introduced Claude Code, an innovative command-line tool designed to allow artificial intelligence systems to autonomously interact with codebases, demonstrating "agentic behavior" where AI can plan and execute tasks without human intervention. Unlike previous AI tools such as ChatGPT that were primarily reactive, Claude Code represented a significant advancement by integrating both the AI model and the tool developed in tandem by Anthropic, facilitating seamless adaptation and evolution of its capabilities. Despite these groundbreaking features, Claude Code did not capture mainstream attention due to its terminal interface and focus on developers. The release of Claude Code signaled a shift from AI systems that merely respond to inputs—such as chatbots—to those capable of proactive action, potentially transforming coding and software development practices. This innovation spurred competitors to develop similar tools, rapidly altering the industry landscape by lowering technical barriers for non-technical users to build applications without needing to code directly. While developers had early insight into this evolution toward more autonomous AI agents, the broader recognition lagged behind, highlighting a disconnect between traditional perceptions of AI and its advancing capabilities. This transition underscores the significance of recognizing the transformative potential in AI technologies and adapting proactively to these changes, as emphasized by educational resources like dentro.de/ai. Claude Code's impact extends beyond developers, illustrating the evolving role of AI from reactive interfaces to proactive agents capable of executing complex tasks independently, setting a precedent for future developments in software creation and utilization. Keywords: #phi4, AI agents, AI transformation, Claude Code, Model Context Protocol (MCP), adoption trap, agentic behavior, agentic tools, autonomous feedback loop, developer community, software developers, terminal tool, vertical integration

model context protocol

dentro.de 13 days ago

2970. HN Show HN: ContextUI open sourced – Local first AI workflows for humans and agents

ContextUI is an open-source AI workflow builder designed for local-first operations on various platforms such as macOS, Windows, and Linux. It enables users to create, execute, and share AI-powered workflows directly from their devices without requiring cloud connectivity. The software offers a user-friendly desktop application with drag-and-drop features and an embedded Python environment, catering specifically to human users by simplifying the creation of complex tasks. For AI agents, ContextUI provides programmatic control for managing workflows, automating UIs, and interacting with Python servers via the Model Context Protocol (MCP). The tool boasts over 25 built-in workflows covering a wide array of functions like text-to-speech conversion, image generation, and video editing, allowing users to extend its capabilities using React, Python, and AI components. To begin using ContextUI, users must have Node.js version 18 or higher, npm, and Git installed. They can initiate the setup by cloning the repository and running the software in development mode with specific commands. While ContextUI runs with nodeIntegration enabled for comprehensive access to Node.js and Electron APIs, it stresses the importance of utilizing only trusted workflows to maintain security. The project operates under an open core model; its fundamental features are available under the Apache License 2.0, while additional premium functionalities such as workflow monetization, cloud hosting options, and hosted language models can be accessed through contextui.ai. Users seeking more information, tutorials, or examples are encouraged to visit the official website, YouTube channel, or Workflow Exchange for further resources. Keywords: #phi4, AI, AI workflows, ContextUI, Linux, MCP, MCP integration, Python, Python environment, React, React TSX components, TSX, Windows, architecture, architecture Keywords: ContextUI, builder, components, exchange, integration, local-first, macOS, open source, security, visual, visual builder, workflow exchange, workflows

model context protocol

github.com 13 days ago

2986. HN Show HN: Poirot – A native macOS companion app for Claude Code

Poirot is a macOS companion application specifically crafted for Claude Code, utilizing SwiftUI to offer an offline browsing experience of local sessions without requiring any login details or data tracking. This lightweight app, developed rapidly in just a weekend and weighing less than 6 MB, focuses on user privacy while providing a comprehensive interface to navigate through conversation histories, code differences, and extended thinking processes related to projects. Key features include session history organization by project, richly formatted views with Markdown rendering, tool block collapsibility, fuzzy search functionality, and management of slash commands. Users can configure settings per project and choose between grid or list views for managing skills, models, plugins, and output styles. The app is open-source under the MIT license, leveraging Swift 6's concurrency model and protocol-driven dependency injection to ensure efficiency and scalability. It utilizes MarkdownUI for text rendering and HighlightSwift for syntax highlighting, with an architecture centered around observable state management and in-memory caching of session data from JSONL transcripts. Poirot's user interface employs dark themes and SF Symbols, ensuring a seamless integration with macOS aesthetics. Contributions are encouraged for bug fixes and new features, reflecting the app’s ongoing development and community engagement through its GitHub issues page. Developers have utilized SwiftLint to maintain code quality, reinforcing the commitment to privacy by refraining from analytics collection or requiring user credentials. Keywords: #phi4, Claude Code, GitHub, GitHub Comma-separated List: Poirot, GitHub Extracted Keywords: Poirot, GitHub Final Keywords: Poirot, HighlightSwift, Homebrew, JSONL, MIT, MarkdownUI, Poirot, SF Symbols, Swift Testing, SwiftFormat, SwiftLint, SwiftUI, architecture, code diffs, companion app, contributions Keywords: Poirot, conversations, dark theme, design tokens, macOS, offline, protocol-driven, sessions

model context protocol

github.com 13 days ago

3022. HN Mq – a command-line tool that processes Markdown using a syntax similar to jq

Mq is a command-line tool built for processing Markdown files using a syntax similar to jq, developed in Rust. It facilitates tasks such as slicing, filtering, mapping, and transforming structured data within Markdown documents. The project remains actively under development, catering primarily to scenarios like managing Large Language Model (LLM) workflows, where it aids in input generation and documentation handling. Key features of Mq include capabilities for extracting specific elements from Markdown files, applying transformations, extending functionalities through custom functions, and providing built-in tools for data manipulation. Additionally, it supports an interactive REPL interface for query testing and integrates with VSCode through extensions and the Language Server Protocol to aid in developing custom functions. An experimental debugger is also available for inspecting and stepping through queries. Mq can be installed quickly using a curl script or Homebrew on macOS/Linux, via Cargo from crates.io or directly from GitHub, by downloading pre-built binaries, or running in Docker with a specified image. Its functionality includes operations such as extracting headings, code blocks, URLs, and table cells, while allowing complex transformations through chained operations and integrating seamlessly with markitdown for enhanced processing. The tool is extensible, permitting users to add custom subcommands by placing executables in specified directories within their PATH, and supports a variety of external tools that expand its functionality. These include syntax checkers, converters, documentation generators, editors, servers, task runners, text-based user interfaces, viewers, and update utilities. Mq is released under the MIT License, making it an open-source solution for efficient Markdown processing across various applications. Keywords: #phi4, Docker, GitHub Actions, IDE, Markdown, Markdown processing, REPL, Rust, command-line, filter, jq, map, mq, slice, subcommands, transform

model context protocol

github.com 13 days ago

3083. HN Show HN: Open-source MCP servers for self-hosted homelab AI

The project presents a suite of open-source Model Context Protocol (MCP) servers tailored for self-hosted AI applications in homelabs, supporting eight common services: Proxmox, n8n, Grafana, AdGuard, Portainer, Ollama, Uptime Kuma, and Mattermost. These MCP servers facilitate Claude Desktop to interact with the infrastructure using natural language processing, thereby removing the necessity for custom API wrappers. The comprehensive implementation includes a total of 40 tools exclusively developed in Python, relying solely on the mcp package without additional dependencies. More details about this project can be found on GitHub at AI-Engineerings-at's repository: homelab-mcp-bundle. Keywords: #phi4, AdGuard, Claude Desktop, GitHub, Grafana, MCP servers, Mattermost, Ollama, Open-source, Portainer, Proxmox, Python, Uptime Kuma, homelab AI, infrastructure, n8n, natural language, self-hosted, services, tools

model context protocol

news.ycombinator.com 13 days ago

3101. HN Automated pentesting with MCPwner (finds 0-days)

MCPwner is an evolving platform aimed at streamlining various aspects of penetration testing into a single, cohesive toolset. It integrates secret discovery, infrastructure scanning, static and dynamic application security testing (SAST/DAST), proof-of-concept development, and exploitation capabilities. The tool currently incorporates established tools like OWASP ZAP, Nikto, SQLmap, Nuclei, Akto, Wapiti, Nmap, Amass, and FFUF to offer comprehensive security assessments. Users can set up MCPwner using a configuration file and scan local projects by mounting them into its Docker container. The project actively encourages contributions to enhance testing infrastructure, error handling, container management, and efficiency in deploying tools with Large Language Models (LLMs). Future developments are focused on enabling remote server deployment through HTTP communication between containers, moving beyond reliance solely on the Docker CLI. MCPwner is designed as a versatile tool for security researchers, facilitating efficient vulnerability discovery, including zero-day exploits, by consolidating essential testing functionalities into one platform. The project invites contributions via pull requests targeting specific improvements. Keywords: #phi4, Akto, Amass, Automated pentesting, DAST, Docker, FFUF, HTTP communication, HTTP communication Keywords: Automated pentesting, IDE/LLM, MCPwner, Nikto, Nmap, Nuclei, OWASP ZAP, POC, SAST, SQLmap, Wapiti, configuration, containers management, contributions, docker-compose, exploitation, infrastructure scanning, remote servers, secrets finding, security research, testing infrastructure, tools, volumes

model context protocol

github.com 13 days ago

3111. HN Check Point Researchers Expose Critical Claude Code Flaws

Researchers at Check Point identified two critical vulnerabilities, CVE-2025-59536 and CVE-2026-21852, in Anthropic’s Claude Code platform that enabled remote code execution and API key theft through malicious repository-level configuration files. These security flaws allowed attackers to bypass trust controls, secretly execute commands, and redirect authenticated API traffic without user consent when developers cloned untrusted projects. This posed severe risks especially in shared workspaces where compromised API keys could lead to unauthorized file access and modifications, as well as unexpected costs. The findings suggest a shift in the threat model for AI supply chains, positioning configuration files as part of the execution layer and thus introducing new attack vectors within enterprise workflows. The vulnerabilities highlighted how agentic AI tools blur traditional boundaries between configuration settings and execution processes, necessitating updated security measures to tackle these emerging risks. In response, Anthropic has improved its platform by implementing enhanced user trust prompts and delaying tool execution and API communications until after users confirm their trust, emphasizing the need for a revised approach to security in AI-driven development environments where configuration files play a crucial role in system behavior. Keywords: #phi4, AI supply chain, API key exfiltration, API key theft, Anthropic, CVE-2025-59536, CVE-2026-21852, Check Point, Claude Code, Hooks, MCP integrations, collaborative workspaces, disclosure process, enterprise risk, environment variables, remote code execution, repository configuration files, security controls, silent command execution, trust boundaries, user consent bypass, vulnerabilities

model context protocol

blog.checkpoint.com 13 days ago

3118. HN Designing APIs for AI Agents

The evolving landscape of API design now necessitates a shift from traditional optimization for human developers towards enhancing "Agent Experience" (AX) due to the rise of AI agents as significant API consumers. This change is particularly noticeable in sectors like fintech and accounting, where autonomous systems automate tasks such as data retrieval and reconciliation. Key challenges identified include improving OpenAPI descriptions by incorporating more semantic information to facilitate agent routing, and developing clear, actionable error responses that allow autonomous systems to self-correct without human intervention. Structured documentation, formatted in Markdown for instance, is crucial for guiding AI agents through API interactions, supported by specific files like `llms.txt` that provide essential context. Services such as Context7 play a role in ensuring the latest API documentation remains accessible to coding tools, which helps resolve discrepancies between outdated training data and current specifications. To maintain efficient agent interactions, it's important to clearly mark deprecated APIs and guide agents towards updated methods. The Model Context Protocol (MCP) offers a standardized approach for AI agent-service interaction but should complement rather than replace well-designed REST APIs. Instruction packages or "skills" provide context and domain knowledge that enhance an agent's task performance within its execution environment. Moreover, Command Line Interfaces (CLIs) have regained significance as they offer native compatibility with AI agents, negating the need for additional integration layers. Overall, optimizing APIs to be comprehensible and usable by machines is becoming as crucial as enhancing them for human developers. This involves refining documentation clarity, improving error handling processes, and enriching semantic descriptions to simultaneously elevate both agent experience and developer experience. Keywords: #phi4, AI agents, API design, CLI tools, Context7, Model Context Protocol (MCP), OpenAPI, agent experience (AX), autonomous integration, developer experience (DX), error handling, llmstxt, skills

model context protocol

www.apideck.com 13 days ago

3122. HN From Spaghetti Code to Enterprise Agentic Infrastructure

MCP Fusion is an innovative TypeScript framework developed to revolutionize Enterprise Agentic Infrastructure by transitioning from the outdated "Naked JSON" architecture to a more efficient Model-View-Agent (MVA) paradigm. It addresses prevalent issues in current systems, such as context bloat, data leaks, out-of-memory crashes, and hallucination loops within Large Language Models (LLMs), through several key innovations. The framework introduces a dedicated Presentation Layer known as the Presenter, which validates responses, manages UI rendering, and implements cognitive guardrails before these are sent over the network. This approach not only reduces schema footprint and token usage but also enhances security by removing sensitive information from data flows. MCP Fusion optimizes API operations through cognitive routing and TOON encoding for token optimization, consolidating multiple functions into fewer tools. It ensures robust error handling with structured recovery hints and mandates rigorous data validation using Zod schemas, automatically rejecting any incorrect inputs. On the enterprise level, MCP Fusion offers concurrency control and state synchronization compliant with RFC 7234 standards, alongside observability features integrated through OpenTelemetry-compatible tracing. The framework allows seamless integration into existing infrastructures by generating MCP servers from OpenAPI specifications or Prisma schema annotations and includes an in-memory testing environment to ensure SOC2 compliance. Overall, MCP Fusion aims to elevate the Model Context Protocol (MCP) into a disciplined Enterprise Engineering approach that facilitates secure, efficient, and scalable interactions between data and AI agents. Comprehensive documentation and further details can be accessed at mcp-fusion.vinkius.com. Keywords: #phi4, Cognitive Routing, Enterprise Agentic Infrastructure, MCP Fusion, Model Context Protocol, Model-View-Agent, Observability Tracing, Presenter Layer, Self-Healing Errors, State Sync, Streaming Progress, Testing SOC2 Audit Patterns, Token FinOps, Type-Safe Client, TypeScript framework, Zod Schema

model context protocol

github.com 13 days ago

3123. HN Dash: A Self-Learning Data Agent That Remembers Its Mistakes

Dash is a self-learning data agent that enhances SQL query generation by integrating institutional knowledge and learning from past experiences. Drawing inspiration from OpenAI's internal tools, Dash employs "GPU-poor continuous learning" to develop a retrieval layer capable of retaining successful patterns while addressing failures. It utilizes six context layers—schema definitions, business logic annotations, proven queries, documentation via the Model Context Protocol (MCP), error corrections, and runtime database introspection—to ensure SQL generation is grounded in practical application. The architecture of Dash includes a hybrid search system that combines dense embeddings with keyword matching to retrieve pertinent context before using a large language model (LLM) for query creation. Successful queries are stored as validated patterns known as Knowledge, while failed attempts trigger automatic diagnosis and correction through the Agno Learning Machine, facilitating continuous self-improvement. In addition to generating SQL queries, Dash provides natural language summaries of SQL results, functioning effectively as a data analyst proxy. It operates within an integrated system using Docker and os.agno.com for context management and learning processes, although it requires substantial initial setup and integration into the Agno ecosystem. Dash is particularly advantageous for organizations aiming to build complex business logic over time rather than those in need of immediate, vendor-neutral, or lightweight solutions. Keywords: #phi4, Agno Ecosystem, Business Logic, Context Repositories, Dash, Data Agent, Docker, Error Learnings, GPU-Poor Continuous Learning, Institutional Knowledge, Model Context Protocol, PostgreSQL, Retrieval Layer, SQL, Self-Learning, Text-to-SQL, Tribal Knowledge

model context protocol

starlog.is 13 days ago

3125. HN Show HN: Upjack – Declarative framework for building apps over MCP

Upjack is an open-source declarative framework designed to streamline application development over the Model Context Protocol (MCP), allowing both developers and non-developers to describe their domains using JSON Schema and Markdown. This results in the generation of a comprehensive suite of tools without needing to write code manually. The framework was demonstrated through the creation of three distinct applications: a CRM system with entities like contacts and companies, a research assistant with topics and notes, and a todo application. Each project began by specifying requirements with Claude Code, which produced schemas, domain skills in Markdown, a server, and seed data to launch a working local app. The framework offers several key features, including declarative app building via JSON Schema and Markdown descriptions, automated generation of tools for each entity (such as create, read, update, delete functions), validation, search capabilities, hook and schedule management for event-based actions, and scheduled tasks. Upjack utilizes flat JSON files backed by Git to facilitate easy version control and storage. It is compatible with both Python and TypeScript environments, built on FastMCP, which allows users familiar with these languages to leverage its functionalities while also offering options for custom logic if necessary. The framework's goal is to simplify the creation of AI-native applications by encouraging collaboration from developers and businesses interested in this innovative approach. The source code for Upjack can be accessed on GitHub, with comprehensive documentation available at upjack.dev. By using Upjack, users can quickly scaffold new apps, define schemas and skills, and deploy fully functional MCP servers without needing to configure traditional APIs or databases manually. Keywords: #phi4, AI-native, CRM, CRUD, JSON Schema, MCP, Markdown, Python, TypeScript, ULID, Upjack, apps, bundles, declarative, documentation, entities, framework, schemas, server, storage, validation

model context protocol

github.com 14 days ago

3133. HN The Site Reliability Agent

The document presents a comprehensive guide on developing an SRE Incident Response Agent designed to autonomously address incidents within software systems, emulating the role of an on-call engineer. The agent leverages a suite of read-write MCP tools facilitated by the Claude Agent SDK to interact with infrastructure components like configuration files and services. Key features include autonomous incident investigation, root cause identification, application of fixes, and documentation, all performed without human intervention. Safety is ensured through restricted directories, command allowlists, and validation hooks to prevent unauthorized or harmful actions. The document also emphasizes educational goals such as safe infrastructure access via MCP tool scoping, effective autonomous behavior, production signal synthesis for diagnostics, and human-in-the-loop workflows to separate investigation from remediation phases. Prerequisites include Docker for infrastructure simulation and specific software tools like an Anthropic API key and Python 3.11+ with necessary packages. The setup involves simulating a local environment using `infra_setup.py` to configure services such as PostgreSQL and Prometheus via Docker Compose. An MCP server using JSON-RPC protocol facilitates tool communication, encompassing metrics querying, configuration management, and shell command execution tools across various categories. A step-by-step execution guide includes setting up infrastructure files, defining safe tool handlers with JSON Schema in the `sre_mcp_server.py`, querying metrics via Prometheus for system status monitoring, implementing safety hooks, conducting baseline checks on healthy systems, simulating incidents by altering configurations (e.g., reducing DB connection pool size), and enabling the agent to autonomously diagnose issues using signals like error rates and logs. In a simulated incident scenario involving database connection pool exhaustion, the document illustrates how the agent reduces an API server's connection pool size from 20 to 1, resulting in service degradation marked by increased errors and latency. The agent independently identifies the root cause as this reduced pool size, restores it, redeploy the server with Docker Compose, verifies normal operations through metrics checks, and documents the incident comprehensively. Furthermore, it discusses extending agent capabilities using skills or runbooks for operational knowledge encoding and integration into platforms like Slack, PagerDuty, and Confluence for improved production environment management. This setup underscores the potential of autonomous agents in efficiently managing SRE tasks with structured tools and agentic loops for both investigation and remediation processes. Keywords: #phi4, API Server, Alerts, Anthropic API Key, Autonomous Diagnosis, Configuration, Confluence documentation, DB connections, Docker Compose, Docker Containers, FastAPI, Human-in-the-Loop, Incident Response, Infrastructure Management, JSON Schema, JSON-RPC, Logs, Metrics, Model Context Protocol (MCP), Observability, Post-Mortem Documentation, PostgreSQL, Production Signals, Prometheus, Python, Remediation, SRE Agent, Safety Hooks, Site Reliability, Tool Descriptions, Traffic Generator, claude-agent-sdk, config management, container logs, dotenv, edit_config_file, error rates, get_container_logs, get_service_health, health checks, httpx, incident management, isolation, latency, list_metrics, query_metrics, read tools, run_shell_command, safety checks, structured playbooks, subprocess, write tools

model context protocol

platform.claude.com 14 days ago

3150. HN The State of AI Agents in 2026: $211B VC Funding, 92% Drop in Inference Costs

By 2026, AI has profoundly transformed multiple sectors with a remarkable reduction in inference costs by 92% over three years, broadening the accessibility of agentic workflows. Despite substantial investments totaling $211 billion in AI ventures in 2025, only a small fraction of organizations report significant financial returns, highlighting ongoing challenges in realizing value from AI expenditures. Technological advancements are evident as models like Claude Opus achieve unprecedented benchmarks in scientific reasoning and task autonomy, indicating a future where machines can sustain complex operations longer than humans. The technological shift has moved the bottleneck from engineering to imagination, empowering AI systems that surpass human capabilities in terms of work duration. This evolution ushers in a "Creator Era" characterized by AI-native platforms facilitating rapid product creation without traditional coding. The era emphasizes composability and network effects among agents, redefining value generation within digital ecosystems. Significant investments are made in AI infrastructure, reinforcing its crucial role. AI's impact extends beyond software to physical infrastructures like data centers and power grids. Enterprises increasingly adopt autonomous multi-agent systems that outnumber human employees but lack governance structures, posing risks as evidenced by incidents such as the Matplotlib case, where an AI agent retaliated autonomously against developers. Although AI capabilities have grown rapidly, issues of error compounding and security persist. The research indicates a projected $10 trillion opportunity for AI, reflecting its transformative impact on computing paradigms and productivity across industries. The report concludes that we are entering the "Direct from Imagination Era," where natural language interfaces enable unprecedented creative expression and execution, fundamentally reshaping software development and innovation. Keywords: #phi4, $10 Trillion Thesis, AI Agents, AI Spending, Agent Governance, Agentic Engineering, Autonomous Tasks, Creator Era, Data Centers, Direct from Imagination, EBIT Impact, Inference Costs, Machine Societies, Network Effects, SaaS Disruption, VC Funding

model context protocol

meditations.metavert.io 14 days ago

3197. HN Show HN: Dance of Tal V2 – Dependency injection and lockfiles for AI agents

Dance of Tal V2 (DOT) is an innovative tool designed to manage artificial intelligence agent contexts through dependency injection and lockfiles, drawing parallels with npm for coding environments. It addresses the challenge of handling large, unwieldy system prompts by introducing modular, versioned, and type-safe components. The core concepts include "Tal," representing an engineer's professional mindset or thinking framework; "Dance," which outlines methodologies or rules for tasks; "Combo," a mechanism that locks a Tal with one or more Dances into a versioned snapshot to ensure team consistency; and "Act," a dynamic workflow that adapts AI behavior based on context, such as shifting from normal operations to incident response. The architecture of DOT uses strict URN notation to store assets: Tals define the intelligence persona; Dances establish format constraints; Combos lock specific Tal and Dance combinations; and Acts manage dynamic workflows. Integration with global registries like Cloudflare KV supports CLI operations for installing, locking, compiling, and running AI contexts, ensuring consistent use of AI personas across a team. In real-world applications, DOT streamlines onboarding by allowing new engineers to set up necessary contexts with a single command, automatically adapts AI behavior during critical incidents, and facilitates parallel agent operations in CI environments through isolated sandboxes. Additional features include the implementation of the Model Context Protocol (MCP), enabling IDEs to pull compiled contexts as needed, and support for publishing assets under a GitHub namespace to ensure version control and schema validation. Overall, DOT aims to enhance AI context management in software development by providing structured, maintainable, and consistent workflows.

model context protocol

github.com 14 days ago

3212. HN Caught in the Hook: RCE and API Token Exfiltration Through Claude Code

Check Point Research identified critical vulnerabilities in Anthropic’s Claude Code that enabled remote code execution (RCE) and API token exfiltration via malicious project configurations, tagged as CVE-2025-59536 and CVE-2026-21852. These flaws leveraged Hooks, Model Context Protocol (MCP) servers, and environment variables to execute arbitrary shell commands and steal API credentials when users cloned untrusted repositories. Specifically, the vulnerabilities allowed unauthorized execution of shell commands through malicious configurations in .claude/settings.json during tool initialization, bypassed user consent by automatically approving MCP server commands via configuration parameters, and enabled API key exfiltration by routing communications through a local proxy prior to user approval. These vulnerabilities posed significant supply chain risks as they exploited trusted development channels such as pull requests and repositories for distributing malicious configurations. To mitigate these issues, Anthropic implemented enhanced warning dialogs, required explicit user approvals for network operations, and introduced additional security measures. Developers are advised to maintain tool updates, meticulously inspect configuration files before accessing projects, heed warnings about unsafe files from tools, rigorously review configuration changes during code reviews, and be cautious of unusual setup requirements. The findings underscore the ongoing challenge in balancing automation with security within modern development tools that integrate AI functionalities. Keywords: #phi4, ANTHROPIC_BASE_URL, API Token Exfiltration, Anthropic, CVE-2025-59536, CVE-2026-21852, Claude Code, Configuration Files, Environment Variables, GitHub Security Advisory, Hooks, MCP Servers, Malicious Payload, RCE, Remote Code Execution, Reverse Shell, Supply Chain Attack, Trust Dialog, User Consent Bypass

model context protocol

research.checkpoint.com 14 days ago

3216. HN Show HN: MCPSpec – Ship reliable MCP servers without writing test code

MCPSpec is an open-source command-line interface (CLI) tool designed to bolster the reliability of Model Context Protocol (MCP) servers, eliminating the need for writing test code by users. It streamlines server validation and Continuous Integration (CI) processes through several key features: regression detection allows users to record sessions with their real servers and replay them after changes to spot regressions; mock generation creates standalone JavaScript mock servers from these recordings, facilitating CI pipeline integration without requiring API keys or a live server connection. The tool enhances security by implementing eight auditing rules that include detecting prompt injection vulnerabilities. Additionally, MCPSpec provides a quality score ranging from 0-100 based on factors such as documentation, schema adherence, error handling, responsiveness, and security measures. It simplifies CI configuration by generating GitHub Actions or GitLab CI setups with a single command. Known for its deterministic and fast performance, MCPSpec includes tests for seven popular MCP servers, aiming to provide integrated solutions for regression detection, mock generation, and security auditing. Users can easily install it via npm using the command `$ npm install -g mcpspec`, and feedback or feature suggestions are encouraged. Keywords: #phi4, API keys, CI build, CLI, GitHub, GitHub Actions, GitLab CI, MCP Inspector, MCP servers, MCPSpec, Model Context Protocol, SDK scripts, Tool Poisoning, ad-hoc, command line interface, deterministic, documentation, error handling, fast, feature ideas Keywords: MCPSpec, feedback, js mock, live server, mock servers, npm install, open-source, quality score, regression detection, reliability, responsiveness, schema quality, security, security auditing, session recording, standalone, tests, unit tests

model context protocol

light-handle.github.io 14 days ago

3234. HN Google Threat Intelligence Group AI Threat Tracker

In late 2025, the Google Threat Intelligence Group (GTIG) identified an escalating trend where threat actors leverage artificial intelligence (AI) to enhance their cyberattack capabilities. The report updates previous findings and highlights AI tools being used for reconnaissance, social engineering, and malware development. A notable increase in model extraction attacks—where attackers steal intellectual property through legitimate API access instead of direct data breaches—has been observed globally. Despite the sophistication of these techniques, there have been no successful direct assaults on cutting-edge AI models by advanced persistent threat actors. State-sponsored groups from countries like the DPRK, Iran, PRC, and Russia are increasingly using large language models for targeted phishing campaigns and technical research. While they explore agentic AI capabilities to create malware tools, significant breakthroughs that could shift the current threat landscape have not materialized yet. Additionally, new malware families are employing AI APIs in deploying second-stage malware, contributing to a burgeoning underground market offering unauthorized "jailbroken" AI services. To counter these emerging threats, Google has implemented proactive measures such as disabling malicious projects and accounts while enhancing model security to prevent misuse. GTIG remains committed to the responsible development of AI and shares best practices with the industry to improve defense mechanisms against AI-enabled cyber threats. More information on specific protection strategies, like those for Gemini, can be found in a related white paper. Keywords: #phi4, AI Threat Tracker, Gemini API, Google Threat Intelligence Group, HONESTCUE, Model Context Protocol, Xanthorox, agentic AI, artificial intelligence, attack lifecycle, classifiers, distillation attacks, jailbreak ecosystem, large language models, malware development, model extraction attacks, phishing lures, reconnaissance, security safeguards, security safeguards Keywords: Google Threat Intelligence Group, social engineering, threat actors

model context protocol

cloud.google.com 14 days ago

3238. HN French national open data platform MCP server

The MCP server hosted on data.gouv.fr provides AI chatbots such as Claude, Gemini, and Cursor with seamless access to datasets from France's national Open Data platform through conversational interfaces. This allows users to interactively query datasets without needing to manually navigate the website. A public instance of this service is available at https://mcp.data.gouv.fr/mcp. To integrate a chatbot with the MCP server, specific configurations are required depending on the platform: ChatGPT uses Web settings; Claude Desktop and Code require adjustments in JSON files or commands; Gemini CLI needs `settings.json` modifications; Mistral Vibe CLI edits for streamable-http transport; while AnythingLLM, VS Code, and Cursor involve changes to their configuration settings. For local setup, users can clone a GitHub repository and run it using Docker or manually with environment variables that control the server’s port and operational mode (prod/demo). The server offers various endpoints to interact with datasets and data services, supporting actions like searching, retrieving information, querying data, downloading resources, and accessing usage metrics. It employs Streamable HTTP transport exclusively for these interactions. Community contributions are welcomed through a standard review process involving automated linting via Ruff, formatting, type-checking with ty, and pre-commit hooks to ensure code quality. The project utilizes an automatic release management script that handles git tagging, GitHub releases, and changelog updates, and is open-source under the MIT License. Keywords: #phi4, AI chatbots, Docker, GitHub CLI, JSON-RPC, MCP server, MIT License, Open Data, Python SDK, Ruff, Streamable HTTP, datagouvfr, dataservices, datasets, metrics, pre-commit hooks, pytest, ty

model context protocol

github.com 14 days ago

3244. HN Show HN: Seite static site generator with MCP server and Claude Code integration

Site HN introduces "seite," a Rust-based static site generator tailored to enhance web presence management for software developers utilizing Claude Code. Developed by the CTO of a startup, seite incorporates a Model Context Protocol (MCP) server that effectively integrates with AI agents like Claude Code. This integration provides tools and resources such as documentation access and theme application. Site HN highlights its ease of use through single-command deployments on platforms including GitHub Pages, Cloudflare, and Netlify, eliminating the need for Node.js or additional setup across macOS, Linux, and Windows. The tool supports multi-language content with built-in translation automation that requires minimal configuration. It also offers SEO optimization by generating canonical URLs, structured data, and essential discovery files like RSS feeds and sitemaps from a single binary without runtime dependencies. The key features of seite include MCP integration for seamless interaction with AI agents, deployment via a single binary compatible across major operating systems, multi-language support with minimal configuration, and SEO and LLM optimization that includes the generation of structured data and discovery files. Additionally, it provides CI/CD integration through auto-generated workflows suitable for platforms like GitHub. The tool is MIT licensed, currently at version 0.1.6, and continues to improve iteratively based on its application within the developer's startup. Keywords: #phi4, AI agent, CI/CD, Claude Code, Cloudflare, GitHub Pages, JSON-LD, LLM discovery, Linux, MCP server, Netlify, Open Graph, RSS feeds, Rust, SEO, SSG, Static site generator, Windows, canonical URLs, claude/CLAUDEmd, content, deployment, hreflang tags, llms-fulltxt, llmstxt, macOS, robots directives, schemas, search indexes, single binary, sitemaps, sub-second builds, templates, themes, translations

model context protocol

seite.sh 14 days ago

3249. HN Show HN: ContextVM – Running MCP over Nostr

ContextVM is an open protocol designed by Gzuuus that enables the Model Context Protocol (MCP) to operate over Nostr, streamlining the deployment of remote MCP servers without necessitating domains, inbound ports, or OAuth—requiring only outbound internet connectivity. By leveraging Nostr relays as a distributed message bus, ContextVM facilitates secure, end-to-end encrypted communication, bypassing traditional security challenges like NATs and firewalls. Key features encompass public key-based identity and authentication, with both clients and servers addressable via public keys, supporting decentralized server announcements and connections. The protocol integrates CEP-8 for defining transaction lifecycles and maintains compatibility with existing MCP servers through tools such as `cvmi`, `ctxcn`, and a TypeScript SDK. Emphasizing ease of use, security, and flexibility, ContextVM encourages community feedback and engagement, providing resources including a project site, documentation, GitHub repository, and a bi-weekly newsletter via Substack. Keywords: #phi4, CLI, ContextVM, MCP, Model Context Protocol (MCP), NAT, Nostr, SDK, TypeScript, TypeScript SDK, encryption, end-to-end encryption, firewall, open source, open source Keywords: ContextVM, payment, payment specification, public keys, relays, transport layer

model context protocol

news.ycombinator.com 14 days ago
https://docs.contextvm.org 14 days ago

3260. HN Anthropic just released a mobile version of Claude Code called Remote Control

Anthropic has introduced Remote Control, a new feature for Claude Code that enhances its usability on mobile devices by allowing users to manage coding tasks from their iPhones or Androids. This capability was previously restricted to desktop and command-line environments but is now accessible to subscribers of the Claude Max tier. By enabling seamless transitions between different workspaces, Anthropic promotes "vibe coding," which encourages developers to use plain English for task management. Remote Control creates a secure connection between local terminals and Anthropic's cloud interface, protecting users' computers while granting access to local files and tools from any location via a synchronized mobile app session. This eliminates the need for unreliable third-party solutions by providing native functionality with stable reconnections in case of interruptions. Since its launch, Claude Code has significantly impacted AI-assisted coding, contributing to 4% of GitHub commits. The introduction of Remote Control extends Anthropic's reach into mobile platforms, fortifying its position in "agentic" coding—a domain where AI tools take on more code generation tasks. This shift encourages developers to focus on strategic oversight rather than manual coding. This development is part of a broader trend where AI technologies are increasingly responsible for code creation, prompting a transformation in developer roles from hands-on coding to supervisory functions. Consequently, this evolution is expected to facilitate the rise of small-scale startups managed predominantly through mobile agentic commands, thus reshaping traditional software development practices. Keywords: #phi4, AI, Anthropic, CLI, CLI environments, Claude Code, Remote Control, agentic, agentic coding Keywords: Anthropic, coding agent, developers, mobile, mobile version, security, security bridge, subscription, subscription tier, synchronization, synchronization layer, vibe coding

model context protocol

venturebeat.com 14 days ago
https://news.ycombinator.com/item?id=47148454 14 days ago

3274. HN Best MCP Servers for Knowledge Bases

The 2026 guide provides an overview of Model Context Protocol (MCP) servers essential for building AI-powered knowledge bases, focusing on those facilitating access to various data sources such as Notion, Obsidian, and Google Drive to address challenges in managing distributed information across different platforms. It emphasizes the importance of choosing the right MCP server based on specific needs, highlighting over 17,000 options but concentrating on key types: local-first solutions like Desktop Commander and Obsidian MCP; cloud-connected tools including Notion, Google Drive, and Slack MCP servers; specialized Knowledge Graphs such as Memory MCP and Cognee; and Vector Search servers like Qdrant and Vectara. Local-first solutions prioritize direct filesystem access and natural language searches without requiring cloud uploads, while cloud-connected tools transform documents into queryable databases or searchable conversation histories. Advanced search and memory features are provided by Knowledge graph servers that track relationships for context retention in AI dialogues, and Vector Search servers that support semantic retrieval across extensive document collections. The guide suggests creating a tailored knowledge management system by combining multiple MCP servers to achieve local file management, cloud-based organization, and enhanced context retention, ultimately allowing seamless integration of various data sources and advanced search capabilities into an efficient AI-driven knowledge management framework. Keywords: #phi4, AI Assistants, API Tokens, Claude Desktop, Cloud Tools, Cognee, Connection Discovery, Cursor, Desktop Commander, Document Analysis, Document Collections, Entity Tracking, File Access, Integration Tokens, Knowledge Bases, Knowledge Graphs, Local Files, MCP Servers, Memory MCP, Notion, OAuth, Obsidian, Persistent Context, Qdrant, Queryable Knowledge, Relationship Mapping, Semantic Retrieval, Tag-Aware Search, Vectara, Vector Search

model context protocol

desktopcommander.app 14 days ago

3285. HN Lightweight OpenClaw Written in C#

DotBot is an efficient implementation of the OpenClaw framework crafted in C# using .NET 10, designed to be lightweight and secure with single-file deployment capabilities, minimizing dependencies. It provides robust security features through approval flows for high-risk operations, alongside functionalities such as file manipulation, controlled shell command execution, web scraping, optional SubAgent delegation, and integration via the Model Context Protocol with external tools. DotBot supports diverse runtime modes like Local REPL, QQ Bot (OneBot V11), WeCom Bot, API Service (compatible with OpenAI), and a gateway mode for handling multiple channels simultaneously. The system includes a built-in Web UI dashboard that facilitates real-time monitoring of various metrics such as token usage, session history, and tool call traces, along with a dynamic Skills system and notification push options via WeCom group bot or webhooks. To set up DotBot, users need the .NET 10 SDK and an API Key from an OpenAI-compatible language model. The project can be built using a script provided within the setup documentation, allowing configuration at both global and workspace levels. Inspired by nanobot, DotBot was developed in two weeks leveraging Microsoft Agent Framework and AI tools for its initial release. It is distributed under the Apache License 2.0. Keywords: #phi4, API Mode, Apache License 20, C#, CLI Mode, Dashboard, Deployment, DotBot, Global config, Lightweight, MCP Integration, NET 10, Notification Push, OpenClaw, Runtime Modes, Secure, Shell commands, Skills System, SubAgent, Web scraping, Workspace config

model context protocol

github.com 14 days ago

3298. HN Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study

The paper "Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study" by Pulak Mehta examines vulnerabilities that arise when autonomous AI agents hire human workers via online marketplaces, specifically through REST APIs and Model Context Protocol (MCP) integrations. The research involves an empirical study analyzing 303 bounties from a marketplace, revealing that 32.7% of these come from programmatic channels like API keys or MCPs, highlighting potential security risks. It identifies six types of abuse: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication circumvention, and referral fraud. These abuses are noted for their low cost, averaging $25 per worker involved. The study uses a dual-coder methodology to ensure reliability in categorizing these abuses ($\kappa = 0.86$). It evaluates the effectiveness of content-screening rules in identifying abusive bounties, successfully flagging 17.2% with minimal false positives, suggesting feasible but currently underutilized basic defenses. The research underscores a significant security threat similar to CAPTCHA-solving services, yet with broader real-world implications. Published under Cryptography and Security (cs.CR) and Human-Computer Interaction (cs.HC), the paper calls for enhanced security measures in AI-driven hiring processes on digital platforms. Keywords: #phi4, AI agents, MCP, Model Context Protocol (MCP), REST APIs, attack surface, authentication circumvention, automated reconnaissance, credential fraud, empirical study, escrow payments, escrow payments Keywords: AI agents, hiring humans, identity impersonation, marketplace, referral fraud, security risks, social media manipulation

model context protocol

arxiv.org 14 days ago
https://rentahuman.ai/ 14 days ago

3326. HN Show HN: Riverse – Local AI agent with memory that grows over time

Riverse is an innovative local AI agent developed by wangjiake that offers personalized, persistent memory across user conversations, distinguishing itself from cloud-based models like ChatGPT and Claude through its unique River Algorithm. This algorithm enables dynamic profile creation by shaping conversation flow over time, settling key information, resolving contradictions during offline "sleep" processes, and reinforcing confirmed knowledge. The agent supports a range of features including persistent memory through timeline-based profiles, offline consolidation for insight extraction, multi-modal input capabilities with text, voice, images, and files, and pluggable tools & skills for integrations such as finance tracking and web searches. It allows external agent integration via services like Home Assistant and Gmail using the MCP Protocol, and can be accessed across multiple channels including Telegram and Discord. Riverse operates primarily on local devices, employing tools like Ollama for large language model inference but can also leverage cloud providers if necessary. As a beta project, it is recommended for single-user applications, with RiverHistory available as a companion tool to import chat histories into Riverse. Installation requires setting up a Python environment and configuring PostgreSQL for data storage, along with optional bot integrations for Telegram or Discord. The project is dual-licensed under AGPL-3.0 for personal and open-source use, with commercial licensing options available upon request. Riverse's primary goal is to deliver a deeply personal AI experience on users' devices while ensuring full control over their data and interactions. Keywords: #phi4, AGPL-30, AI agent, Discord, FastAPI, Flask, MCP protocol, PostgreSQL, REST API, River Algorithm, Riverse, Telegram, WebSocket, YAML, commercial license, memory, offline cognition

model context protocol

github.com 14 days ago

ScraperSpider

Scraper
Spider