Scraper
Spider

A robotic spider About
Blog
@dbaman@fosstodon.org
Click ▶ to show/hide AI summary and keywords
Click The google logo for Google search on keywords

2026-02-18 17:27
gemini cli
gemini cli stories from the last 14 days  | Back to all stories
49.  HN Agent Skills 101: a practical guide for engineers
"Agent Skills 101: A Practical Guide for Engineers" offers a structured methodology to enhance AI agents' capabilities within engineering teams by developing skills as markdown files (SKILL.md) containing procedural knowledge tailored to team-specific needs. These skills enable AI agents to consistently apply the correct procedures without requiring constant guidance, addressing context gaps in problem-solving related to tools, deployment processes, and testing strategies. The guide introduces a three-phase skill loading system—metadata, instructions, and resources—to optimize token usage and prevent cognitive overload. A SKILL.md file comprises YAML frontmatter for metadata and a markdown body detailing executable procedures, with optional fields like allowed-tools that can restrict tool usage during tasks. The description field serves as the trigger for skills, written in third person to ensure activation based on relevance without prematurely revealing details. Skills are organized at project, personal, or extension levels, with project-level precedence in shared environments. They differ from other technologies such as custom instructions, AGENTS.md, prompt files/commands, MCP servers, bundles, and workflows by focusing on task-specific procedural knowledge and activation relevance. Bundles group related skills for roles or projects, while workflows sequence multiple skills into comprehensive procedures. Installation and management of community skills are facilitated via a CLI tool (`npx skills add`), with storage in directories like `.skills.sh` or `.github/skills/`. The guide advises reviewing `SKILL.md` files to ensure quality and safety before installation due to the unmoderated nature of public community skills. Platform-specific management varies, with VS Code providing a diagnostics view for issue identification, Claude Code supporting auto-discovery, Gemini CLI requiring user consent for activation, and Cursor allowing toggling of Agent Skills in settings. Validation is achievable using `npx skills-ref validate`, ensuring compliance with frontmatter structure and field constraints. Skill catalogs aid in managing extensive collections by listing available skills alongside categories and keywords, while bundles assist in skill discovery and learning paths. Workflow patterns prioritize documentation over specifications to link multiple skills into multi-step procedures like "Ship a feature." The guide emphasizes concise `SKILL.md` descriptions (under 1,024 characters) and body text limits (200 words or under 500 lines for frequently-loaded and standard skills, respectively). Creating a skill involves identifying repetitive tasks, setting up directories, writing SKILL.md with name, description, workflow, and rules, and refining trigger conditions through testing. Platform-specific notes highlight differences in skill loading, validation support, and management features across tools like VS Code, Claude Code, Cursor, Gemini CLI, and OpenAI Codex, ensuring effective integration of skills into engineering workflows. Keywords: #phi4, AGENTSmd, AI agents, Agent Skills, CLI tools, Cursor Rules, MCP servers, Markdown body, Progressive Disclosure, YAML frontmatter, agent consent, allowed-tools, authentication, bundles, community, compatibility, context efficiency, cross-agent communication, custom instructions, documentation, domain expertise transfer, engineers, environment requirements, extension skills, installation, instructions, live data access, metadata, mistakes, patterns, personal skills, platform, portability, power cord, procedural knowledge, project skills, prompt files, real-time streaming, references, resources, rules, skill activation, skill authoring, skill catalog, skill directory, skill discovery, skill management, skill storage, storage locations, tags, tooling, triggers, user manual, validation, verification steps, workflows, write operations
    The google logo   gist.github.com 4 hours ago
118.  HN Show HN: SentinelGate – Universal Firewall for AI Agents (Open Source, Go)
SentinelGate is an open-source firewall developed in Go, specifically designed to enhance security for AI agents by intercepting and controlling access to various machine operations like tool calls, shell commands, file access, and HTTP requests. It employs Role-Based Access Control (RBAC) via Common Expression Language (CEL) policies, ensuring a detailed audit trail of all activities. Key features include acting as an intermediary that evaluates actions against predefined policies without requiring code changes to the AI agent’s codebase. SentinelGate offers quick setup on macOS, Linux, and Windows platforms, either through a script or by building from source. The Admin UI facilitates policy creation, management, and access to audit logs without needing configuration file edits. It enforces deterministic rules to prevent unauthorized operations, such as blocking simple tool patterns like `delete_*`. Detailed logging records actions with identity, decision, timestamp, and arguments. Users can manage policies and monitor AI agent activities using a browser-based UI, with options to run SentinelGate as either an MCP proxy for agents or a standalone MCP server. Despite its effectiveness in preventing accidental misuse or prompt injection by AI agents, it is not an OS-level sandbox and thus may be bypassed by malicious processes. Commercial offerings under SentinelGate Pro include additional features like Single Sign-On (SSO), Security Information and Event Management (SIEM) integration, and compliance reporting. The project is open-source under the AGPL-3.0 license, with commercial options available via sentinelgate.co.uk, and encourages contributions following guidelines in the CONTRIBUTING.md file. Keywords: #phi4, AI agents, API keys, Admin UI, CEL policies, Go, HTTP requests, MCP tool calls, Open Source, RBAC, SIEM integration, SSO, SentinelGate, Universal, audit trail, compliance reports Extracted Keywords: SentinelGate, compliance reports Final Keywords: SentinelGate, compliance reports Keywords: SentinelGate, configuration, firewall, limitations, proxy, runtime hooks, sandbox, security, shell commands
    The google logo   github.com 8 hours ago
143.  HN Show HN: Disco Checkers
Disco Checkers is a dynamic terminal-based checkers game crafted in Python 3 that operates without any extra installation requirements. Utilizing the Gemini CLI and Gemini 3 Flash model, it offers a unique dual-perspective view of the board for both Red's and Black's players. The game distinguishes itself with vibrant disco-inspired aesthetics, including an animated header, walking lights border, flashing king squares, and dynamically changing colors on special squares. Built using an Immutable Core / Imperative Shell architecture, Disco Checkers ensures reliable state management through dataclass definitions, pure functions for move calculations, and efficient rendering with ANSI colors. Thoroughly tested with unit tests that cover game rules, complex scenarios, visual effects, and string manipulation utilities, the game requires Python 3.7 or higher and a terminal capable of handling Unicode and ANSI color codes. To play, users simply run `python3 main.py`, choosing either human or CPU opponents for each side and making moves via displayed hotkeys, with the option to exit by pressing 'q'. The project is open-source under the MIT license. Keywords: #phi4, ANSI Colors, ANSI Utilities, Dataclass Objects, Disco Checkers, Dual Perspective, Event Loop, Gemini CLI, Immutable State-Machine, King Promotion, Multi-Jumps, One-Touch Input, Pure Functions, Python3, TTY State, Terminal Game, Unicode Support, Unit Tests, Vibe-coding, Visual Effects
    The google logo   github.com 10 hours ago
189.  HN Show HN: DevDay – End-of-day recap for AI coding sessions
DevDay is a privacy-focused tool designed for developers utilizing multiple AI coding assistants such as OpenCode, Claude Code, and Cursor. It offers end-of-day recaps of AI-assisted coding sessions by analyzing local session data in conjunction with git commits, thereby facilitating the creation of standup-ready summaries through integrations with platforms like Concentrate AI, OpenAI, or Anthropic. Key features include local-only operation for enhanced privacy, detailed insights into tokens used, estimated costs, duration, and models per session, as well as session grouping by project with associated git commit displays. Users can optionally generate first-person standup messages to streamline reporting. To use DevDay, developers must install it via npm using the command `npm install -g devday`, after which they can access daily recaps or summaries through various commands such as `devday`, `devday -d [date]`, or `devday --standup`. The tool is optimized for macOS and supports further customization by cloning its repository, building it, and linking it. Optional LLM summaries necessitate the configuration of API keys from Concentrate AI (recommended), OpenAI, or Anthropic, with Concentrate AI providing free credits to offset summarization costs over extended periods. DevDay estimates session durations based on message processing times and calculates costs using token counts when not directly provided by tools, thus offering comprehensive insights into development workflows. Keywords: #phi4, AI coding sessions, API key, Anthropic, Concentrate AI, DevDay, OpenAI, git commits, local data, macOS support, npm install, session recap, standup summaries, token counts
    The google logo   github.com 20 hours ago
320.  HN Show HN: Discoding – run AI CLIs locally, relay them to Discord
Discode is a locally-run tool designed to integrate AI coding Command Line Interfaces (CLIs) within tmux sessions, allowing real-time output relayed directly to messaging platforms like Discord or Slack. Developed as an evolution from OpenClaw, it focuses on conversational control rather than full autonomy by embedding AI CLI interactions into these communication channels. The key features of Discode include a relay-only architecture that avoids additional abstraction layers, support for multiple AI agents such as Claude Code and Gemini CLI, automatic detection of installed AI agents, project isolation with dedicated messaging channels, and the ability to manage several projects using a single Discord bot connection. Technically, it operates locally without cloud dependencies, utilizing persistent tmux sessions that remain active across disconnections. Written in TypeScript, Discode employs a dependency injection pattern for enhanced testability and is compatible with macOS (as developed), Linux, and Windows through WSL, though not natively on Windows due to the absence of tmux support. Installation can be achieved globally via npm or Bun commands, through binary installation using curl without needing Node runtime, or by sourcing from the GitHub repository. Users must ensure they have the requisite prerequisites such as tmux version 3.0+, Bun version 1.3+, and a configured Discord bot with specific permissions and intents. Discode offers user-friendly features like automatic setup commands, session management tools, and CLI references to streamline integration into existing workflows. The project is open for contributions under the MIT License, emphasizing strict adherence to TypeScript standards. By enabling developers to interface with AI CLIs remotely via Discord, Discode enhances workflow efficiency and provides greater control over coding tasks. Keywords: #phi4, AI CLIs, Bun, Discoding, Discord, OpenClaw, TypeScript, conversational control, daemon process, multi-agent support, persistent sessions, project isolation, real-time streaming, tmux
    The google logo   github.com a day ago
337.  HN Show HN: Mtb – An MCP sanity checker for vibe coding
"Make the Bed (mtb)" is an MCP sanity checker for AI-driven coding projects inspired by a Calvin & Hobbes comic strip, aimed at preventing "vibe-coded" projects—those created with enthusiasm but without considering existing solutions or maintenance costs. It guides developers using structured questions and complexity metrics to favor established tools over reinvention. The tool features several key components: **Consult**, which employs a 5 whys framework for evaluating new features; **Stats**, providing software composition analysis for complexity and COCOMO cost estimates; **Checklist**, ensuring operational readiness through checks like CI/CD, monitoring, and documentation; and **Compare**, analyzing the impact of changes on code complexity and maintenance. mtb integrates with environments such as VS Code and OpenAI Codex and is open-source under the MIT license, promoting contributions while prioritizing simplicity. It exemplifies its principles by using lightweight dependency scanning tools in self-assessments, advocating for thoughtful development that emphasizes problem-solving over unnecessary complexity, akin to making the bed rather than building a robot to do it. Keywords: #phi4, AI agents, CI/CD, CLI tool, COCOMO, GitHub, Go vet, MCP, Make the Bed, Socratic method, Syft dependency, automated tests, code analysis, complexity metrics, cyclomatic complexity, dependencies, deployment pipeline, documentation, govulncheckExtracted Keywords: Make the Bed, govulncheckKeywords: Make the Bed, monitoring, on-call, operational readiness, sanity checker, scc, security audit, software maintenance, transitive modules, vibe coding
    The google logo   github.com a day ago
350.  HN Show HN: Context Lens: View your CLI's agent context in realtime
**Context Lens** is a local proxy tool designed for developers to analyze and visualize how their coding tools interact with Large Language Models (LLMs) in real-time, without necessitating code modifications. It supports various tools such as Claude Code, Codex, Gemini CLI, Aider, and Pi by capturing API calls during usage. Key features include the ability to break down a session's context window into components like system prompts and tool results, track costs per turn or session, and differentiate interactions between main agents and subagents through conversation threading. It also offers insights into token usage and cost distribution among different agents, as well as visual tools for understanding changes in context over time. The installation of Context Lens can be achieved globally via `pnpm` or `npm`, or run directly using `npx`. Users must set up specific environment variables to direct traffic through the proxy. It supports reverse proxies for HTTP and mitmproxy for HTTPS interception, catering especially to tools like Codex, with configurable CLI options for privacy settings and UI management. Context Lens is particularly beneficial for developers seeking to understand the financial aspects of using coding agents by analyzing context composition rather than just token usage. Its local operation ensures data privacy without reliance on cloud services, making it suitable primarily for individual optimization efforts rather than team or production-level monitoring. In contrast with observability tools like Langfuse and Braintrust that require code instrumentation, Context Lens captures API interactions transparently as a proxy. It includes features to identify potential issues such as large tool results and overflow risks while supporting automatic tool recognition. Sessions are stored locally with options for data reset via the UI, and it adheres to an MIT license for open-source use. Keywords: #phi4, CLI, Context Lens, HTTPS interception, HTTPS interception Keywords: Context Lens, LHAR export, LLM API, coding tools, conversation threading, cost tracking, installation, privacy mode, proxy, reverse proxy, token usage
    The google logo   github.com a day ago
351.  HN Show HN: Proxima – local open-source multi-model MCP server (no API keys)
Proxima is an open-source local multi-model AI orchestration server designed to facilitate the connection and management of various AI providers through a single endpoint, eliminating the need for API keys. It enables users to interact with multiple AI models like ChatGPT, Claude, Gemini, and Perplexity using existing browser sessions, supporting tasks such as chat, search, translation, and coding. Proxima's main features include access via a unified endpoint (`/v1/chat/completions`), ensuring privacy by running locally on the user’s machine, and compatibility with multiple AI providers through an intelligent routing system that selects the best provider based on availability and task requirements. The platform offers over 45 multi-conversation protocol (MCP) tools for diverse functionalities like content analysis, session management, and file handling. To get started, users can download Proxima via GitHub or install it directly by running `npm start`. Configuration involves logging into AI providers through a local interface and setting up MCP in supported environments such as VS Code. The system is versatile, supporting HTTP requests and SDKs for Python and JavaScript, making it adaptable to various development needs. It integrates with applications like Cursor, VS Code, and Gemini CLI via configurable MCP server commands and provides comprehensive documentation and troubleshooting resources. Proxima's license restricts its use to personal, non-commercial purposes, emphasizing privacy and user control over data interactions. In essence, Proxima serves as a flexible local gateway for managing multiple AI services seamlessly within development environments without compromising privacy or requiring external API credentials. Keywords: #phi4, AI providers, API keys, CLI tools, Electron app, JavaScript, MCP server, OpenAI-compatible, Proxima, Python, REST API, SDKs, Smart Router, architecture feedback, browser sessions, local gateway, multi-model, non-commercial use, orchestrate workflow, reliability observability, troubleshooting
    The google logo   github.com a day ago
355.  HN Codex CLI vs. Claude Code on Autonomy
Srihari Sriraman's blog post on Nilenso examines the contrasting autonomy levels of Codex CLI and Claude Code, two coding agents, highlighting how system prompts influence their behaviors and operational approaches. Codex identifies as a "coding agent" focused on achieving goals collaboratively with users, whereas Claude positions itself more as an interactive tool for assisting user tasks. While Codex exhibits higher autonomy by persisting in task completion without constant user input, Claude encourages interaction through questions and seeking clarifications from users. Codex is characterized by its support for proactive actions and creative problem-solving, especially in the absence of prior context. In contrast, Claude favors a cautious approach that emphasizes simplicity and discourages over-engineering. Philosophically, Codex prioritizes task completion even with minimal user consent, whereas Claude stresses alignment with user preferences, requiring approval before proceeding. The post underscores system prompts as critical in directing these AI models' behaviors, suggesting the behavioral differences stem from how each model interprets such instructions. This analysis illuminates that understanding system prompts can provide deeper insights into the functionalities and intended applications of AI tools like Codex and Claude. Keywords: #phi4, AI tools, Claude Code, Codex CLI, RL, RL (Reinforcement Learning), ambition, autonomy, coding agent, collaboration, identity, inference, interactive mode, model behavior, non-interactive mode, non-interactive modeKeywords: Codex CLI, persistence, post-training, proactiveness, restraint, software engineering tasks, system prompts, task completion, user alignment
    The google logo   blog.nilenso.com a day ago
455.  HN Now I see why OpenClaw is popular
OpenClaw is emerging as a significant tool for startups navigating the competitive AI sector by facilitating connections between AI providers and messaging tools while managing computer operations. Its primary advantage lies in streamlining development processes, allowing companies to avoid building custom solutions from scratch, which was previously exemplified by one startup's use of an Express.js websocket server linked with Gemini CLI. OpenClaw provides vendor independence along with well-documented integration options, improving security and ease of maintenance for its users. For one startup, it enables a user-friendly agent feature accessible to non-technical users, while another utilizes it as a backend system to handle JSON manipulation tasks. By integrating OpenClaw, both companies can concentrate on innovation rather than infrastructure concerns, thereby addressing specific needs in AI application management and change management more efficiently and creatively. Keywords: #phi4, AI agents, CTO, Expressjs, Gemini CLI, Hetzner, JQ, JSON, OpenClaw, agentic AI, change managers, chat interfaces, chokidar, computer control, creativity gateway, development experience, infrastructure, messaging tool, non-technical users, provider abstraction, startups, vendor-independent, websocket server
    The google logo   tornikeo.com 2 days ago
504.  HN Show HN: Fuelcheck CLI – Monitor token usage across the modern AI providers
Fuelcheck CLI is a command-line utility developed in Rust designed for monitoring and managing token usage across various AI providers, offering data outputs compatible with text or JSON formats suitable for dashboards and scripts. It features multi-provider checks, automation-friendly JSON outputs, local cost scanning capabilities, live TUI watch mode, and the ability to customize provider sources using options like OAuth, web, API, CLI, and local. To install, users can use `cargo install fuelcheck-cli` or build from source with `cargo build --release`. Configuration is initiated via `fuelcheck-cli setup`, which auto-detects local credentials for providers such as Codex, Claude, and Gemini. Users can retrieve usage data using `fuelcheck-cli usage` and calculate costs with `fuelcheck-cli cost --provider codex`. The live watch mode can be activated through `fuelcheck-cli usage --watch`. Configuration files allow users to specify provider details including ID, source type (e.g., OAuth, API), and optional elements like cookies or API keys. The setup process varies based on the authentication method and is detailed in the tool's documentation for each supported AI provider. Fuelcheck CLI supports a wide array of providers including Codex, Claude, Gemini, Cursor, Factory (Droid), MiniMax, Kimi, Copilot, Kiro, Vertex AI, JetBrains AI, Amp, Warp, and OpenCode, enabling users to tailor their monitoring setups through environment variables or configuration files according to specific provider requirements. Keywords: #phi4, AI, AI providers, API, API key, CLI, CodexBar, Fuelcheck CLI, JSON, OAuth, Rust, TUI, TUI watch mode, command-line, command-line utility, configuration, cost, local, local cost scan Keywords: Fuelcheck, multi-provider, scan, token, token usage, utility, watch
    The google logo   github.com 2 days ago
532.  HN Show HN: SkillDeck – macOS app to manage skills across multiple AI agents
SkillDeck is a macOS application designed to streamline the management of skills across various AI code agents by providing a desktop graphical user interface (GUI). This tool eliminates manual file editing and symlink configuration, offering users an intuitive way to manage their development environment. SkillDeck supports multiple AI code agents such as Claude Code, Codex, Gemini CLI, Copilot CLI, and OpenCode, enabling seamless interaction through features like multi-agent support, a unified dashboard, one-click installation from GitHub, automatic updates, and an SKILL.md editor with live preview functionality. The application is built using the Model-View-ViewModel (MVVM) architecture and leverages @Observable in macOS 14+ to monitor changes efficiently. The system treats directories containing SKILL.md files as a database for storing skills, which simplifies file management tasks. Users can install SkillDeck through several methods: by downloading a universal binary from GitHub, using Homebrew, or building it from source with Swift on macOS Sonoma. This flexibility ensures that developers of varying skill levels can easily set up and use the application. SkillDeck is designed to ensure thread-safe access to the filesystem using Swift actors, which enhances its performance and reliability. The project encourages community contributions by allowing users to fork and submit pull requests, in line with guidelines outlined in its development documentation. Licensed under MIT, SkillDeck aims to provide a robust tool for developers seeking an efficient way to manage AI agent skills within their macOS environment. Keywords: #phi4, AI agents, CLI, GUI, GitHub, Homebrew, MIT license, MVVM architecture, SKILLmd editor, SkillDeck, SkillManager, Sonoma, Swift, Xcode, YAML parsing, agent assignment, auto-refresh, build from source, contributing, desktop app, filesystem database, installation, macOS, multi-agent support, services actor, skills management, symlink management, universal binary, update checker
    The google logo   github.com 2 days ago
563.  HN Kintsugi
Kintsugi is a specialized development environment created by Sonar designed to enhance the workflow of CLI agent users in managing and reviewing AI-generated code changes. It operates as an Agentic Development Environment (ADE), focusing on orchestrating agents for code review rather than direct coding, which distinguishes it from conventional Integrated Development Environments (IDEs). The system augments existing CLI agents such as Claude Code, Gemini CLI, and Codex by integrating visual capabilities to improve their functionality without supplanting these tools. At present, Kintsugi's support is exclusive to the Claude Code agent, thereby providing a tailored interface for reviewing and managing code changes produced by this specific AI tool. Keywords: #phi4, AI-generated changes, Agentic Development Environment (ADE), CLI agent, Claude Code, Codex, Gemini CLI, Kintsugi, Sonar, agents, code review, orchestration, quality checks, security checks, visual capabilities, workflow
    The google logo   events.sonarsource.com 2 days ago
586.  HN Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers
SafeClaw is a tool specifically crafted to manage numerous instances of the software Claude Code, each housed within its distinct Docker container. It provides an intuitive dashboard facilitating oversight and swift setup with pre-configured defaults, ensuring efficient session management. The underlying container-based architecture guarantees isolation from the host system while offering faster initialization compared to traditional virtual machines, allowing parallel task execution without interference among sessions. The tool is initially set up with Ubuntu 24.04, Node.js 24 (LTS), and Claude Code version 2.1.32, along with optional integrations like Gemini CLI and Slack read access. It features a web-accessible terminal via ttyd, retains conversation histories for ongoing tasks, and securely manages authentication tokens. Key functionalities of SafeClaw include lightweight container management, independent session operation with rapid start/stop processes, persistent conversation history, straightforward integration of additional tools, and a user-friendly command-line interface to manage sessions. The dashboard aids in creating and managing sessions while displaying live activity, making SafeClaw ideal for research or experimentation requiring multiple concurrent instances of Claude Code. Keywords: #phi4, CLI, DX plugin, Docker, Gemini, GitHub CLI, JSONL files, Nodejs, Playwright MCP, SafeClaw, Slack, Ubuntu, authentication, auto-compact, containers, context usage, environment variables, npm scripts, secrets management, tmux, ttyd, volume mounts
    The google logo   github.com 2 days ago
709.  HN Watching Code Fly By
On February 14, 2026, the author explores the advantages and significance of rapidly observing code changes—referred to as code "flying by"—in contexts like diffs from pull requests or through tools like Claude Code. Often overlooked or undervalued, this approach enables developers to swiftly identify potential issues such as poor encapsulation, unnecessary system scans, unwanted dependencies, and misplaced fixes. The skill of quickly assessing these changes is likened to the rapid interpretation of road signs or sports broadcasts, where seasoned code readers can detect problems efficiently. While tools like the Gemini CLI currently provide effective displays of relevant code modifications, there remains room for improvement in how this information is presented. The author underscores that although thorough reading remains valuable, quick assessments are sometimes adequate, particularly when supported by tests or AI-driven confidence measures. This method's utility is compared to reviewing status reports or stock listings, underscoring its increasing relevance and importance within the realm of software development. Keywords: #phi4, AI coding, CLI, code, dependencies, diffs, logic encapsulation, performance, problem location, pull requests, readers, terminal tools, tests pass
    The google logo   www.natemeyvis.com 3 days ago
749.  HN Show HN: DevDay – End-of-day recap for AI coding session
DevDay is a command-line utility tailored for developers who utilize AI coding assistants such as OpenCode, Claude Code, and Cursor. It offers an end-of-day recap by analyzing local session data, aligning it with Git commits, and optionally producing standup summaries through services like OpenAI or Anthropic, all while prioritizing privacy by executing operations locally unless users specifically opt for LLM-generated summaries. The tool’s key features include the ability to scan AI coding sessions without transmitting data externally (except when summary generation is chosen), presenting details such as tokens used, estimated costs, session durations, and models involved. DevDay can also categorize sessions by project alongside corresponding Git commits, and it facilitates the creation of first-person standup messages. Currently supporting macOS, DevDay installs through npm with a straightforward command (`npm install -g devday`) and provides various command options to generate recaps for today's work or specific dates in different formats. Users can enable summary generation by configuring API keys for OpenAI or Anthropic. Additionally, the tool assesses session durations based on message processing times and estimates costs using token counts when necessary. Keywords: #phi4, AI coding, API key, Anthropic, Claude Code, Cursor, DevDay, LLM summaries, OpenAI, OpenCode, cost estimation, git commits, local data, macOS support, message processing, model pricing, npm install, project directory, standup summaries, token counts
    The google logo   github.com 4 days ago
752.  HN GLM-5 topped the coding benchmarks. Then I used it
GLM-5, an open-source AI model developed by Zhipu AI under the MIT license, demonstrates high efficacy on coding benchmarks such as SWE-bench and Terminal-Bench 2.0 but shows mixed results in more complex evaluations. When tested on a unique NP-hard problem (KIRO) and Terminal-Bench, GLM-5's performance was inconsistent; it showed competitive capabilities in some best-case scenarios but often generated invalid outputs with high variability between trials. Furthermore, the model frequently encountered timeout issues, indicating challenges in maintaining reliable execution under practical constraints. In the KIRO test, GLM-5 performed averagely compared to other agents and frequently failed to complete tasks within time limits. On Terminal-Bench, its success rates varied significantly based on different frameworks, with Claude Code achieving 40.4% task completion and Mistral Vibe at 48.3%. This contrasts sharply with Zhipu AI's reported scores of 56-61%, attributed to differences in testing conditions such as time limits, infrastructure, and model parameters. Analysis of execution traces reveals that while GLM-5 comprehends appropriate algorithms, it struggles with the depth and reliability required for consistent task completion. The model also faced difficulties with file editing tasks due to unfamiliar formats, suggesting potential improvements through fine-tuning on specific agent interfaces. Overall, although not fundamentally flawed, GLM-5's real-world performance indicates a need for enhancements to ensure a more consistent user experience, highlighting the gap between its theoretical benchmarking success and practical usability in varied contexts. Keywords: #phi4, API, Anthropic, CPU constraints, Claude Code, Coding Plan subscription, GLM-5, Go condition, HuggingFace, KIRO, MIT License, Mistral Vibe, NP-hard optimization, OpenAI-compatible, SWE-bench, Terminal-Bench, Zhipu AI, agent frameworks, coding benchmarks, file editing, fine-tuning Keywords: GLM-5, invalid output, memory constraints, open-source, think mode, timeout, token limits, trajectory analysis, variance, wall-clock time limits
    The google logo   charlesazam.com 4 days ago
842.  HN Weird System Prompt Artefacts
The article by Srihari Sriraman on the nilenso blog delves into "Weird System Prompt Artefacts," discussing the role of system prompts in mitigating undesirable behaviors exhibited by language models. It examines how these prompts evolve over time through various modifications or "patches" to address specific issues like link generation, verbosity, and interaction styles. Key points include: - The **Claude Code** uses instructions to prevent URL creation, aiming to reduce risky behavior stemming from non-programming contexts. - In the **Cursor & Codex CLI**, there is a focus on using precise tool names for file edits to minimize errors; Cursor employs heuristics due to frequent user-model co-authorship, whereas Codex shifts away from ChatGPT-style interactions toward more autonomous operations. - The **Gemini CLI** and **OpenHands** highlight concerns about token consumption, reflecting an awareness of resource usage during model operations. - A comparison between **Codex and Gemini** on test management reveals differing philosophies: Codex avoids adding tests to untested codebases, while Gemini advocates for including tests with new features. These examples collectively illustrate how engineers adapt system prompts to manage learned behaviors and biases in models, enhancing safety and efficiency. Keywords: #phi4, System prompts, URL generation, anti-comment, binary generation, concurrency control, context distraction, context-distraction, corrective instructions, high verbosity, high-verbosity code, identity strings, legacy prompt, link hallucination, markdown etiquette, model behavior, test addition, test addition Keywords: system prompts, token consumption, validation phrases, workspace native, workspace-native behavior
    The google logo   blog.nilenso.com 4 days ago
901.  HN Show HN: Kintsugi – A desktop app for reviewing Claude Code sessions
Kintsugi is an innovative desktop application developed by Sonar's engineering team to augment Claude Code sessions, functioning primarily as an Agentic Development Environment (ADE). It focuses on orchestrating and reviewing AI-generated code rather than writing it, with the objective of enhancing both code quality and security while preserving rapid development cycles. The tool offers several key features: parallel orchestration of agents, AI-driven code reviews resembling pull requests complete with commenting functions, plan reviews similar to Google Docs, and integrated Sonar analysis for detecting local issues. Although predominantly constructed using Claude Code itself, Kintsugi is currently only available on macOS, despite internal versions existing for Linux and Windows platforms. The application serves as a prototype aimed at gathering user feedback and guiding future improvements. Kintsugi emphasizes seamless visual integration with CLI agents, providing users with extensive workflows to confidently manage AI-generated code changes, thus ensuring robust and secure development practices. Keywords: #phi4, ADE, AI code review, AI generated code, Agentic Development Environment (ADE), CLI agent, Claude Code, Code Review, Codex, Gemini CLI, IDE-like, Kintsugi, Sonar analysis, SonarQube, desktop app, feedback, macOS, orchestration, parallel agents, prototype Keywords: Kintsugi, quality checks, security checks, visual capabilities
    The google logo   events.sonarsource.com 5 days ago
965.  HN Safe YOLO Mode: Running LLM Agents in VMs with Libvirt and Virsh
The guide offers comprehensive instructions for setting up isolated environments for Large Language Model (LLM) agents on Linux servers using Libvirt and Virsh, specifically within virtual machines. This approach is crucial in minimizing security risks by creating controlled environments, especially when LLMs operate with extensive permissions ("yolo mode"). The document underscores the advantages of Libvirt over Lima, highlighting its suitability for production-grade server contexts due to lower resource demands and robust management capabilities. To set up this environment on Ubuntu/Debian systems, users must install QEMU, libvirt, and associated tools. The guide details the process of downloading a pre-built Ubuntu cloud image, resizing it, and creating a new virtual machine using `virt-install`. Various virsh commands are provided to manage these VMs, including starting or stopping them, accessing consoles, managing snapshots, and cloning. The document also offers additional tips for optimizing the VM environment with tools like Tmux, fzf, Go, Docker alternatives such as containerd/nerdctl, and Node.js. It addresses SSH access configuration via Tailscale or internal IPs to enable remote management. For network configurations, while default NAT setups are suggested, bridged networking is recommended for production environments. Users can further tailor their VMs using custom cloud-init scripts for automated provisioning. The guide concludes by summarizing essential commands and installation steps to assist users in efficiently implementing the setup process. Keywords: #phi4, LLM agents, Libvirt, Linux servers, Tailscale, Ubuntu, VMs, Virsh, cloud-init, isolation, networking, provisioning, qemu-kvm, snapshots
    The google logo   www.metachris.dev 5 days ago
   https://github.com/nibzard/agentlab   3 days ago
985.  HN Conductor Update: Introducing Automated Reviews
The Conductor extension for Gemini CLI has introduced an Automated Review feature aimed at improving AI-assisted engineering processes through enhanced validation and reporting following code implementation. This new capability enables developers to ensure that their code meets quality standards and adheres to predefined guidelines, thus facilitating the verification of compliance during development. By generating a comprehensive post-implementation report automatically upon completion of coding tasks, Conductor effectively closes the loop in the development lifecycle, providing an end-to-end solution for maintaining high standards in software engineering practices. Keywords: #phi4, AI-assisted engineering, Automated Reviews, Conductor, Gemini CLI, code quality, coding agent, compliance, context-driven development, execution, markdown files, planning, post-implementation reports, validation, verify step
    The google logo   developers.googleblog.com 5 days ago
994.  HN Show HN: Context Lens: Devtools for your agent context
Context Lens is a sophisticated local development tool specifically crafted for developers utilizing large language models (LLMs), such as Claude Code, Codex, Gemini CLI, Aider, and Pi. It functions as an intermediary proxy between coding tools and LLM APIs, capturing API calls without necessitating code alterations within the tools themselves. The core features of Context Lens include composition breakdown to provide visual insights into components filling the context window (e.g., system prompts, tool definitions), cost tracking for estimating expenses per turn or session across different models, conversation threading to organize API calls by sessions and interactions between agents and subagents, and an agent breakdown detailing token usage and costs per agent. Additionally, it offers a timeline visualization with filtering capabilities, context diff to show changes over turns, and a findings panel that flags potential issues like large tool results or risks of context overflow. The tool also supports automatic detection and data exporting in LHAR format. Installation is straightforward via npm or pnpm, including direct npx execution, and it accommodates multiple environments through reverse proxies, even handling HTTPS interception as required. Context Lens is designed to operate entirely on a developer's local machine, ensuring privacy and control over captured data, making it particularly useful for developers facing challenges with closed-source tools that cannot be directly instrumented. While it provides detailed observability into LLM session context composition to optimize usage without altering tool code, it is not intended for production monitoring or team dashboards—other solutions like Langfuse are recommended for such needs. The tool operates under an MIT license and stores captured requests both in memory (up to 100) and persistently across restarts. Keywords: #phi4, Agent context, Composition breakdown, Context Lens, Cost tracking, Devtools, Environment Variables, HTTPS interception, HTTPS interception Keywords: Context Lens, Installation, LLM API, Local proxy, Proxy, Reverse proxy, Supported Providers
    The google logo   github.com 5 days ago
1109.  HN I turned old laptops into an AI coding farm ($15/month vs. Devin's $500)
Ralph Loops is an open-source initiative that repurposes old laptops into a cost-effective autonomous AI coding system, offering significant savings over traditional services by operating at around $15 per month compared to more expensive alternatives like Devin's $500/month service. The project leverages repurposed hardware within a Tailscale VPN on a trusted network and features an architecture comprising one control PC (running Windows) and multiple worker PCs. These workers execute various tasks overnight using tools such as the Claude CLI, with Gemini serving as a backup. The system assigns specific roles to worker PCs, including backend, frontend, tests, design, utility functions, manager, and additional utility operations. Task execution is controlled by scripts like `start-night.sh` and managed by a designated manager PC. Tasks are defined in markdown files stored within a GitHub repository, which acts as the central source of truth for task coordination. Security is a critical component of Ralph Loops, emphasizing operation on trusted networks to ensure configurations, task files, and AI agents undergo strict validation processes that prevent unauthorized access or misuse. Measures include input validation, explicit staging with `git`, and sanitized shell commands to bolster security. The system supports autonomous overnight execution, enabling the manager PC to review outcomes in the morning, generate tasks for any failures, and document lessons learned. Designed explicitly for trusted environments due to its reliance on elevated privileges and private networks, Ralph Loops is unsuitable for untrusted or public-facing deployments. Setup prerequisites include at least three old laptops running Linux, a Tailscale account, and access either to the Claude API or an Anthropic Max subscription, along with Gemini CLI. Currently in version 1.0, Ralph Loops features heartbeat monitoring, task recovery, and automatic validation. Future enhancements aim to integrate web dashboards and support multiple projects. Operating under the MIT License, Ralph Loops provides comprehensive documentation and a contributing guide, facilitating user implementation and extension of its capabilities. Keywords: #phi4, AI coding farm, Claude CLI, Gemini fallback, Git coordination, Tailscale VPN, autonomous agents, manager-worker architecture, mentor oversight, open-source system, repurposed hardware, security model, task execution
    The google logo   github.com 6 days ago
1133.  HN I benchmarked 4 coding agents on an NP-hard problem I solved 8 years ago
This summary examines the comparative analysis of four coding agents—Claude Code, Codex, Gemini CLI, and Mistral—on an unpublished NP-hard fiber network optimization problem initially solved by the author using C++. The task involves designing a fiber network to connect cell towers with specific constraints on redundancy loops and branches. Claude Code notably outperformed the author's solution in one of three trials, demonstrating its efficacy under various testing conditions that included different programming languages (Python versus Go) and varying time limits (30 minutes versus 1 hour). The study's key findings reveal several critical insights into AI agent performance optimization. First, the practice of prompt engineering—offering a specific target hint—significantly enhanced agent performance compared to vague prompts like "keep improving," which were particularly ineffective for weaker agents such as Mistral. The choice of programming language played a pivotal role in the benchmarking process; Python was found to be superior due to Go's challenging compilation requirements, which often led to invalid solutions from skipped validation steps. Furthermore, Claude Code’s iterative improvement strategy proved more successful than Mistral's one-shot heuristic approach. This highlights the advantage of continuous refinement over single-attempt solutions in complex problem-solving scenarios. Additionally, while increased time allocation did not universally enhance performance, it benefited agents like Claude Code that were equipped with effective frameworks to utilize additional time for improvement. The analysis also identified common failure modes, including constraint violations and challenges related to output formatting or file saving—issues arising from attempts at intricate optimizations without sufficient validation steps. Overall, the study underscores the significance of prompt engineering, iterative solution development, and strategic language selection in optimizing AI agent performance on complex tasks. While acknowledging the limitations of this single-task benchmark, such as a small sample size and specific conditions, it offers valuable insights into the capabilities of coding agents beyond conventional benchmarks. Keywords: #phi4, Docker container, Go language, NP-hard problem, Python, agent reliability, algorithm efficiency, benchmarking, coding agents, constraint violations, fiber network, iterative optimization, simulated annealing, solution validation
    The google logo   charlesazam.com 6 days ago
1180.  HN Show HN: SuperLocalMemory– Local-first AI memory for Claude, Cursor and 16+tools
SuperLocalMemory V2 addresses the challenge of "amnesia" in AI tools by providing a robust local-first memory system that allows developers to maintain continuity across sessions without repeatedly re-explaining project contexts, coding preferences, and past decisions. It ensures data privacy and ownership through local storage and seamlessly integrates with over 16 AI tools like Claude Desktop, Cursor, Windsurf, VS Code, among others, requiring zero setup or external configurations such as API keys. The system employs a sophisticated 10-layer architecture, featuring A2A Agent Collaboration, Web Dashboard, Hybrid Search, Pattern Learning, and Knowledge Graphs to enhance functionality. Key technical aspects include its foundation on research like the A2A Protocol, GraphRAG, MACLA Bayesian learning, and A-RAG hybrid search, adapted for local implementation. It utilizes SQLite with FTS5 and TF-IDF vectors to achieve efficient searching capabilities, maintaining sub-second performance even with large datasets. The system is designed to recognize user patterns over time, offering more personalized assistance while supporting multiple profiles to prevent context overlap between projects. Installation is straightforward via npm or by cloning its GitHub repository, as SuperLocalMemory V2 auto-configures itself for various environments and tools. Compared to cloud-based alternatives that often entail costs and privacy issues, SuperLocalMemory V2 stands out by being free, local, and fully private, making it an all-encompassing solution for persistent context maintenance in AI-driven development settings. Keywords: #phi4, AI memory, Bayesian confidence, CLI commands, SQLite storage, SuperLocalMemory, hierarchical clustering, knowledge graph, local-first, multi-tool integration, pattern learning, privacy, real-time dashboard, zero cost
    The google logo   github.com 6 days ago
1272.  HN Show HN: Open-Source Skills for AI Agents
The "Awesome AI Agent Skills" repository provides a comprehensive suite of over 70 open-source skills designed to bolster AI agents' functionality across diverse domains such as artificial intelligence/machine learning (AI/ML), API integration, code development, communication, and data analytics. These modular skills adhere to a standard format, ensuring compatibility with popular platforms like Claude Code, OpenAI Codex, and GitHub Copilot. Each skill is organized in its own directory, complete with a SKILL.md file that offers structured instructions and metadata, enabling users to seamlessly integrate these capabilities into their projects. The repository categorizes the skills into 14 distinct areas, including data analysis, cloud monitoring, content strategy, and security auditing, aiming to streamline development tasks such as model training, API design, code documentation, and marketing analytics. The project encourages community involvement by inviting contributions for new or improved skills, as outlined in the CONTRIBUTING.md file. Released under the MIT License, this collection supports extensive usage and collaboration within the AI community, facilitating innovation and efficiency in AI agent development. Keywords: #phi4, AI Agents, Automation, Categories, Code Generation, Community-driven, Contributions, Data Analysis, Design, Development, Documentation, Integration, License, MIT, Markdown, Modular, Open-Source, Platforms, Repository, Reusable, SKILLmd, Security, Security Audits, Skills, Workflow, Writing, YAML
    The google logo   github.com 7 days ago
1281.  HN Entire - hooks into your Git workflow to capture AI agent sessions
The tool "Entire" is designed to enhance the integration of AI agents within a Git workflow by automatically capturing and indexing AI agent sessions during code development. It stores these sessions as metadata in a dedicated branch (`entire/checkpoints/v1`), separate from traditional code commits, allowing developers to maintain a searchable history of how their code was crafted. Entire integrates seamlessly with Git, capturing session data on every push and offering robust workflow management through commands like `enable`, `disable`, `status`, `rewind`, and `resume`. These features facilitate efficient session tracking and version control, accommodating two checkpointing strategies: manual-commit and auto-commit. To set up Entire, prerequisites include having Git installed, operating within a supported OS (macOS or Linux via WSL), and using an authenticated AI agent CLI like Claude Code or Gemini CLI. Installation can be performed through Homebrew or Go, followed by running `entire enable` to initialize hooks in the project repository. The workflow involves enabling hooks with either checkpointing strategy, managing sessions in the background, and utilizing commands for rewinding changes or restoring session metadata. Configuration is handled via JSON files located in a `.entire/` directory within the project, allowing users to set preferences such as strategy type, logging levels, and telemetry options. Users can also make local configuration adjustments that won't affect team settings when committed to Git. Common issues like "Not a git repository" errors or SSH authentication problems are addressed by ensuring the current working directory is a Git repository or configuring SSH host keys appropriately. Entire leverages `mise` for task automation and dependency management, and it supports screen reader accessibility through an accessible mode. The project encourages community engagement by inviting users to report bugs or request features via GitHub issues, underscoring its commitment to continuous improvement in facilitating AI-driven development within Git workflows. Keywords: #phi4, AI agent, CLI, Entire, Git, checkpoints, commits, configuration, hooks, sessions, strategies, troubleshooting, workflow, worktrees
    The google logo   github.com 7 days ago
1323.  HN Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers
SafeClaw is a management tool for handling multiple instances of Claude Code running in Docker containers, providing an efficient and isolated environment compared to full virtual machines. It features an intuitive dashboard that allows users to oversee sessions easily, with the ability to set up new instances using default settings swiftly. The platform supports concurrent execution of diverse research tasks without session interference, ensuring each conversation history is saved locally for persistence across restarts. Users can start new instances through a simple script (`./scripts/run.sh`), customize their setup by mounting local projects, and manage sessions with additional scripts provided. SafeClaw offers optional integrations such as Gemini CLI or Slack read access, operating on an environment that includes Ubuntu 24.04, Node.js 24 (LTS), Claude Code version 2.1.32, GitHub CLI, Playwright MCP, among other tools. Security is maintained by running with `--dangerously-skip-permissions` in a containerized setup, which is deemed secure. Authentication tokens are securely managed for each session, with the option to add further secrets as needed. The dashboard, initiated through `node dashboard/server.js`, enables users to create and control sessions while viewing live iframes of active ones. Interaction with SafeClaw is facilitated via various npm scripts and shell aliases within containers. Keywords: #phi4, CLI, Chromium, Docker, Gemini, GitHub, JSONL files, MCP, Nodejs, Playwright, SafeClaw, Slack, Ubuntu, aliases, authentication, containers, environment variables, npm scripts, skills, tmux, ttyd, web terminal
    The google logo   github.com 7 days ago
1355.  HN Gemini writes, Claude polishes, JetBrains rests: an agent development pipeline
In November 2025, a seasoned technical director transitioned from traditional Integrated Development Environments (IDEs) to an innovative agent-based development pipeline leveraging AI models for enhanced efficiency and cost-effectiveness. This new workflow utilizes three AI models: Gemini handles routine code generation tasks such as boilerplate creation, GLM steps in when Gemini reaches its limits, and Claude Code is reserved for more complex duties like refactoring and making architectural decisions. The director developed a Command Line Interface (CLI) tool named Gokin in Go to manage these AI resources efficiently, ensuring cost savings by using less expensive models for routine tasks while reserving the pricier Claude model for sophisticated work. The pipeline operates much like an assembly line where each AI agent manages specific stages of software development. This strategy results in significant cost reductions—around $130-$180 monthly per project or approximately $1500-2000 annually, compared to relying solely on Claude Code. Security is meticulously maintained by redacting sensitive information such as API keys and passwords before processing through the AI models. The agent-based approach not only improves efficiency but also shifts developers' focus from syntax-oriented tasks to higher-level architectural concerns, thus reducing cognitive load and boosting productivity. While IDEs remain useful in specific areas like frontend development, this pipeline is particularly advantageous for backend programming with languages such as Go, PHP, and Python. The open-source nature of Gokin, available on GitHub, encourages community involvement and further enhancements. Keywords: #phi4, AI models, Agent-based programming, Claude Code, Gemini CLI, GitHub Copilot, Go language, Gokin, IDEs, JetBrains Toolbox, agent management, architecture, backend development, cognitive load, cost efficiency, development pipeline, digital juniors, prompt engineering, provider agnosticism, security, technical director, terminal
    The google logo   ginkida.dev 7 days ago
1368.  HN Show HN: Claudit – Claude Code Conversations as Git Notes, Automatically
Claudit is an advanced tool designed to enhance code collaboration by automatically saving conversations from Claude Code into Git Notes for every commit, providing a comprehensive audit trail of discussions leading up to changes in the codebase. It utilizes agent interactions and Git hooks to ensure these conversation notes are consistently attached to commits across multiple developers working within the same repository. A key feature of Claudit is its ability to automatically generate and attach conversation notes during both developer-initiated commits and those made by Claude Code itself, ensuring seamless integration without disrupting workflows. The tool supports collaboration among multiple developers by merging conversation notes from various contributors without data loss, even when multiple notes reference the same commit. It is compatible with Git worktrees, allowing conversations to be scoped to individual branches while sharing hooks across them, which enhances flexibility and efficiency in development environments that utilize branching strategies extensively. Claudit maintains note integrity during rebase operations by leveraging git's `notes.rewriteRef` configuration, ensuring that notes stay linked to their respective commits regardless of any structural changes. Additionally, Claudit handles the complexities introduced by GitHub's "Rebase and merge" strategy by remapping orphaned conversation notes to new commit IDs when SHAs change. To facilitate its use, Claudit offers a suite of commands such as `claudit list` and `claudit show [ref]` for viewing conversation histories, along with `claudit resume <commit>` to continue discussions from specific commits. Developers can visualize these notes through the `claudit serve` command and manage synchronization with remote repositories using `claudit sync push/pull`. The tool also includes a diagnostic feature (`claudit doctor`) to identify configuration issues, ensuring smooth operation. For effective utilization of Claudit, it is necessary to have Git installed along with the Claude Code CLI for session resumption. This setup supports multi-developer synchronization and is essential for maintaining the integrity and accessibility of conversation notes across collaborative projects. Claudit operates under the AI Native Application License (AINAL), which governs its usage and distribution. Keywords: #phi4, Automation, Branches, CLI, Claudit, Commit, Git, GitHub, Hooks, Merge, Rebase, Sync, Worktrees
    The google logo   github.com 7 days ago
1449.  HN Add-MCP CLI: npx skills but for installing MCP servers
The Add-MCP CLI is a command-line interface designed to facilitate the installation of Model Context Protocol (MCP) servers into various coding agents with ease, similar to `npx` for Node.js packages. It supports multiple platforms such as Claude Code, Codex, Cursor, OpenCode, VSCode, among others, and allows installations via URLs or npm packages using straightforward commands. The tool offers a range of options for customizing the installation process, including global or project-specific installations, targeting specific agents with the `-a` flag, specifying transport types (`http`, `sse`) with `--transport/--type`, adding custom HTTP headers through `--header`, setting server names via `--name`, skipping confirmation prompts with `-y`, and installing to all agents using `--all`. A notable feature is its smart detection capability, which automatically identifies coding agents based on the environment: in project mode by searching for config files like `.cursor/mcp.json` and in global mode by detecting globally installed agents. The CLI supports various transport types, including HTTP (default), SSE (deprecated but still supported), and stdio for local servers, while also allowing custom HTTP headers to be passed, although this feature is not supported by all agents such as Goose. The tool provides a `list-agents` command to display all supported coding agents and their installation scope—either project or global. By default, MCP servers are installed in the project context but can be configured for global installation using the `-g` option. The utility of MCP servers lies in enhancing coding agents by integrating external services, databases, file system access, and specialized tools tailored to specific workflows. For troubleshooting, users should verify server URLs and configuration syntax, ensure there are no naming conflicts with existing servers, and check write permissions on target directories. The tool is licensed under Apache 2.0. Keywords: #phi4, Add-MCP CLI, HTTP headers, MCP servers, Model Context Protocol, Model Context Protocol Keywords: Add-MCP CLI, coding agents, global mode, installation, project scope, smart detection, supported agents, transport types, troubleshooting
    The google logo   github.com 8 days ago
1455.  HN CLI – hooks into your Git workflow to capture AI agent sessions
Entire is a command-line interface tool designed to enhance Git workflows by integrating AI agent session tracking with code commits across macOS, Linux, and Windows via WSL. It requires Git and an authenticated CLI for either Claude Code or Gemini. The tool captures complete interactions as checkpoints within two strategies: manual-commit, which records checkpoints during user or AI-initiated commits, and auto-commit, which does so after each agent response. Entire offers seamless session management, enabling users to rewind or resume sessions at previous checkpoints. It maintains a separate branch (`entire/checkpoints/v1`) for storing session metadata without affecting the main codebase, supporting multiple concurrent AI sessions on the same commit through git worktrees. The typical workflow involves activating Entire in a repository by installing hooks, allowing AI agent interactions to be tracked automatically in the background. Users can manage sessions via commands like `entire rewind` or `entire resume <branch>`, with an option to disable Entire without impacting code history. Configuration settings are managed through JSON files located in `.entire/`, with project-specific configurations committed to Git and personal preferences typically ignored. Entire provides several commands for its management: enabling (`entire enable`), disabling (`entire disable`), checking status (`entire status`), and managing sessions (`entire rewind` or `resume`). Additional functionalities include cleaning up data, fixing issues, and viewing versions. The development of Entire leverages Mise for task automation, requiring users to install Mise and build the CLI according to its configuration. The tool supports accessible mode for screen readers and offers solutions for common problems like SSH authentication errors and conflicts with shadow branches. Under the MIT License, Entire encourages open-source contributions and bug reporting via its GitHub repository. Keywords: #phi4, AI agent, CLI, Git, checkpoints, commits, configuration, hooks, metadata, sessions, strategies, troubleshooting, workflow, worktrees
    The google logo   github.com 8 days ago
1527.  HN Hunter3 Is Not OpenClaw
Hunter3 is an advanced AI assistant designed to seamlessly integrate messaging channels with large language model (LLM) providers and external tools using an IRC-based communication system managed via a WebSocket gateway. It enables real-time interaction management, allowing for on-the-fly self-modifications that ensure automatic reconnection to the IRC server upon changes. The architecture routes messages from various channels to agents interacting with LLMs like Claude CLI, Ollama, or Gemini CLI within a secure framework designed in Go 1.24+. Hunter3 is highly configurable through YAML files and offers extensive extensibility via Model Context Protocol (MCP) servers for API interactions and Docker container management. Key features include self-modifying capabilities, structured logging with zerolog, and support for plugin systems enabling custom event hooks. It provides flexible session management that supports both per-sender and global scopes, along with streaming support for handling incremental responses from LLMs. Built using a pure-Go SQLite database, Hunter3 ensures secure data handling without relying on CGO operations, enhancing portability. The system allows customization through its configuration files, covering IRC server settings and database options, with binaries generated via build commands like `make build`. Overall, Hunter3 stands as a robust framework for developing AI-driven chatbots and assistants, offering significant extensibility through its plugin architecture and MCP systems. Keywords: #phi4, AI assistant, CLI tools, Hunter3, IRC, LLM providers, MCP servers, SQLite, WebSocket, configuration, event hooks, plugins, self-modifying, streaming support
    The google logo   github.com 8 days ago
1595.  HN Show HN: Agx – A Kanban board that runs your AI coding agents
AGX is a local-first Kanban board designed specifically to manage AI coding tasks using autonomous agents. It addresses the challenge of agent persistence by decoupling control from execution planes, enabling constant-cost task resumption without replaying past interactions. AGX leverages PostgreSQL for state management and supports multiple AI providers such as Claude Code, Gemini CLI, and Ollama. The platform emphasizes durable, resumable execution through a bundled dashboard that allows live monitoring of the system's state, alongside features supporting multi-provider integration and customizable project-specific workflows. Unlike conventional chat UIs or hosted SaaS services, AGX functions as infrastructure to reliably operate agents on local machines. It offers straightforward setup requirements, including PostgreSQL (which can be managed via Docker) and any AI provider CLI. Users interact with AGX using commands that facilitate task initialization, creation, execution, and monitoring. The architecture of AGX is split between a control plane, responsible for state management and orchestration within PostgreSQL, and a data plane, where execution tasks are handled by the AGX CLI and Daemon. Its technology stack comprises Next.js, Tailwind CSS, PostgreSQL, Node.js, and TypeScript. The project encourages community contributions through GitHub Discussions and Issues, fostering collaborative development and improvement. Keywords: #phi4, AI agents, CLI, Kanban board, PostgreSQL, agent persistence, autonomous agents, control plane, data plane, durable state, local-first, pg-boss, providers, task execution
    The google logo   github.com 8 days ago
1597.  HN Show HN: A CLI tool to automate Git workflows using AI agents
"Git PR AI" is a command-line tool that automates Git workflows using artificial intelligence to enhance tasks such as creating branches, preparing pull requests (PRs), and conducting code reviews. It integrates with platforms like GitHub and GitLab through gh and glab respectively, and collaborates with various AI agents including Claude Code, Gemini CLI, Cursor Agent, and Codex CLI. A primary design objective of the tool is to maintain agent-agnostic functionality, allowing users to switch between different AI tools seamlessly without needing custom prompts or adopting specific Message Completion Protocols (MCP). This feature, coupled with a quick setup process from installation to executing the first PR, significantly simplifies Git workflows. The utility offers project management integrations such as utilizing JIRA tickets for automatic branch name and context generation. Installation is straightforward via `pnpm add -g git-pr-ai`, which grants access to various Git subcommands directly in the terminal. It provides numerous features like AI-generated commit messages, contextual PR descriptions, real-time code reviews with improvement suggestions, and weekly summaries for project reviews or standups. These capabilities aim to streamline development processes by reducing manual intervention. "Git PR AI" ensures full compatibility across multiple platforms and AI providers, accommodating diverse user configurations. For further information, users can refer to the comprehensive documentation available in the repository at https://github.com/leochiu-a/git-pr-ai. Additionally, user feedback and inquiries are encouraged to enhance the tool's functionality and usability. Keywords: #phi4, AI agents, Branch creation, CLI tool, Claude Code, Code reviews, Codex CLI, Commit messages, Cursor Agent, Gemini CLI, Git workflows, GitHub, GitLab, Installation, JIRA tickets, PR creation, PR descriptions, Pull Requests, Semantic branch names, Subcommands, Weekly summaries
    The google logo   github.com 8 days ago
1601.  HN Grumpy Julio plays with CLI coding agents
The author shares their journey with Claude Code, an AI-based coding agent, reflecting on initial skepticism due to prevalent issues like code bloat and poor quality. Despite these concerns, the author discovered that Claude Code significantly enhanced productivity for straightforward and repetitive tasks, even without deep technical expertise, by aiding in feature implementation, script writing, and Emacs plugin creation. While acknowledging its utility, the author cautioned against over-reliance on AI-generated code, noting it often necessitates substantial human refinement to achieve production quality and efficiency. Ultimately, the author concluded that while coding agents are beneficial for specific tasks, they should complement rather than replace traditional programming skills and critical thinking in software development. Keywords: #phi4, AI tools, AI-based coding, C++ compiler, Claude Code, Emacs, EndBASIC, EndTRACKER, GitHub, LLMs, NixOS, PRs, Servo, code duplication, coding agents, integration, iteration, maintenance costs, nixpkgs, performance problems, personal productivity, personal productivity AI-based coding, personal productivity Comma-separated Keywords: AI-based coding, personal productivity Comma-separated List: AI-based coding, personal productivity Extracted Keywords: AI-based coding, personal productivity Final Comma-separated List: AI-based coding, personal productivity Final Keywords: AI-based coding, personal productivity Final List: AI-based coding, personal productivity Keywords: AI-based coding, personal productivity Simplified Keywords: AI-based coding, productivity, prompts, review, slop, software bloat, software engineering, software projects, ticket tracker, ticketel, tool belt, web browser
    The google logo   jmmv.dev 8 days ago
1626.  HN Show HN: SpecOps – Spec-Driven Development for Infrastructure as Code
SpecOps is an open-source Command Line Interface (CLI) framework designed to integrate Spec-Driven Development into Infrastructure as Code (IaC) projects, addressing the challenge of ad-hoc scripting by establishing a structured workflow that progresses from idea conception through planning and execution. This technology-agnostic framework supports tools like Terraform, Pulumi, CloudFormation, and Ansible, incorporating over 17 AI coding agents such as Claude Code and GitHub Copilot to assist at every stage. SpecOps automates the generation of project structure, templates, and command files while providing validation checkpoints and documented rollback procedures for each deployment phase. The framework is inspired by GitHub's Spec Kit but specifically tailored for infrastructure engineering, enforcing a systematic IaC approach through five key steps: establishing principles, defining requirements, creating technical plans, generating task breakdowns, and executing deployments. It supports diverse use cases including multi-organization Kubernetes platforms, entire application stacks, and compliance-ready infrastructures. SpecOps is MIT licensed, encouraging community contributions to enhance AI integrations, cloud templates, documentation, and testing processes. Users can install the CLI tool via a specific command from GitHub, which underscores SpecOps' goal of fostering more organized, reliable, and AI-assisted IaC methodologies for infrastructure teams. Keywords: #phi4, Ansible, ArgoCD, Cilium, Compliance, GitHub, GitOps, Grafana, Infrastructure as Code, Kubernetes, MIT License, Multi-tenancy, Prometheus, RBAC, Scalability, Security, Spec-Driven Development, SpecOps, Terraform
    The google logo   github.com 8 days ago
1666.  HN Show HN: CodeGraphContext- An MCP server that indexes code into knowledge graphs
CodeGraphContext is an advanced MCP server developed to index local code into graph databases, significantly enhancing the capabilities of AI assistants in understanding large codebases. It addresses limitations in traditional RAG systems that often provide excessive or irrelevant context by utilizing Graph RAG technology to deliver precise, relationship-aware insights. Key features include building detailed architecture maps for contextual clarity, synchronizing documentation with evolving code changes, and supporting AI tools in navigation, completion, and debugging tasks. As an MCP server, CodeGraphContext integrates seamlessly with various development environments like VS Code, Gemini CLI, and Cursor. The system offers a range of functionalities: it constructs knowledge graphs from code components, facilitates complex relationship queries (such as callers, callees, and class hierarchies), provides pre-indexed bundles for immediate use, updates the graph in real-time based on directory changes, and operates both as a standalone CLI toolkit and an MCP server. Installation is straightforward via pip, with solutions provided for common issues like PATH errors. The project supports multiple databases including FalkorDB Lite and Neo4j, accommodating numerous programming languages. Users can operate CodeGraphContext in two modes: CLI mode for direct terminal-based code analysis and querying relationships or visualizing graphs, and MCP Server mode to enable natural language queries by AI assistants through configured IDEs or CLI tools. The project, open-sourced under the MIT License, encourages community contributions and discussions on feature enhancements, with detailed guidelines available. Actively maintained by Shashank Shekhar Singh, CodeGraphContext fosters a collaborative space for developers leveraging AI-assisted code analysis. Keywords: #phi4, AI assistants, CLI toolkit, CodeGraphContext, FalkorDB Lite, GitHub, Graph RAG, MCP server, Neo4j, VS Code, code indexing, context-aware, knowledge graphs, natural language queries, repository management, repository management Keywords: CodeGraphContext, static analysis
    The google logo   github.com 9 days ago
1733.  HN IDEcline: How the most powerful coding tools became second-class citizens
The article "IDEcline" examines the transformation in the role of Integrated Development Environments (IDEs) as they shift from being central coding tools to platforms that primarily oversee AI-driven agents in software development. Historically, IDEs like Visual Studio and IntelliJ were pivotal due to their features enhancing developer productivity. This centrality is waning with the advent of advanced AI coding tools. The transition unfolded through three distinct phases: initially, AI served as a supplementary tool within IDEs (Wave 1), primarily improving functions such as autocomplete. In the second phase (Wave 2), AI agents were integrated into terminal environments, handling more complex tasks beyond mere code suggestions. The current phase (Wave 3) involves desktop control planes that manage multiple AI agents to execute various development activities, thus shifting the focus from traditional text editors to task dashboards. As IDEs become relegated to "second-class citizens," primarily used for verification and debugging rather than as central hubs, companies like Microsoft, Google, and JetBrains face strategic challenges. These organizations must adapt to a new landscape where agent-first workflows dominate. Critical factors such as security, compliance, and developer trust will determine the success of either standalone control planes or IDE-integrated solutions. The future of software development is increasingly centered on auditing and verifying AI contributions within codebases, representing a shift from traditional editing roles to those emphasizing orchestration and verification. Keywords: #phi4, AI models, IDE, auditing provenance, autocompletion, coding tools, competitive landscape, control planes, desktop applications, long-running jobs, multi-agent tasks, orchestration, parallelism, security compliance, task dashboard, terminal agents, workflows
    The google logo   thenewstack.io 9 days ago
1780.  HN Show HN: Forge – 3MB Rust binary that coordinates multi-AI coding agents via MCP
Forge is an orchestration tool developed in Rust that facilitates coordination among various AI coding agents like Claude Code, Codex CLI, and Gemini CLI. Weighing approximately 3 MB, it addresses prevalent challenges such as file conflicts, knowledge retention issues, and architectural drift by providing a centralized management platform. Its core features include: - **File Locking:** This mechanism prevents multiple agents from editing the same files simultaneously, ensuring seamless collaboration. - **Knowledge Flywheel:** A system for capturing and storing decisions and patterns which can be easily queried to maintain continuity across different sessions. - **Drift Detection:** It evaluates recent changes against a predefined project vision using language models like GPT-4.1, maintaining alignment with the project's specifications. - **Governance:** Conducts health checks on various dimensions such as documentation quality, architecture integrity, and task health to uphold overall project standards. Forge functions as an MCP server via stdio, ensuring compatibility with any AI tool that supports MCP. It features a pluggable "brain" for intelligent decision-making, accommodating both rule-based systems and LLM engines like OpenAI's GPT models. The state is managed through a JSON file located in the `.forge/` directory, making it human-readable and trackable via Git. To set up Forge, users initialize it within their project, generate task plans based on specifications, execute tasks with designated AI tools, and monitor project health through CLI commands or MCP queries. Its architecture supports seamless integration by providing adapters for various supported tools. Licensed under MIT, Forge encourages community contributions to broaden its capabilities, such as adding more brain models or enhancing synchronization processes across different configurations. By unifying multiple AI coding tools under a single orchestration layer, it significantly boosts workflow efficiency and project consistency. Keywords: #phi4, AI integration, AI tools, ASCII dashboard, CLI commands, CLI dispatch, Forge, JSON-RPC 20, LLM engine, MCP server, OpenAI API integration, Rust, actionable findings, architecture, binary size, deterministic operations, drift detection, event logging, file locking, git hygiene, governance, governance score, headless task execution, health check, human-readable cards, intelligent decisions, knowledge base, master plan, multi-agent coordination, orchestration, plan decomposition, pluggable brain, project spec, project state, state reconciliation, statejson, task management, tool adapters, tool inventory, zero runtime deps
    The google logo   github.com 9 days ago
1918.  HN Show HN: SAA – A minimal shell-as-chat agent using only Bash
SAA (Single Action Agent) is a minimalist shell-based chat interface developed as a Go binary, designed to transform terminals into chat environments using only Bash. It was created in response to performance issues and complexity found in existing tools, focusing on simplicity by relying solely on Bash. SAA supports local large language models like GLM-4.7-Flash and manages sessions discreetly without disrupting user workflows. Key features include session management, project-specific configurations, and seamless integration with APIs such as OpenAI. Installation requires Go 1.23 or later, and users can configure it to work with various models through command-line options. The tool encourages customization via scripts and wrappers, allowing for personalized enhancements like UI integrations or notifications. SAA is tailored for Unix users who prefer managing their own sandboxing solutions, such as Docker or bubblewrap, rather than having them built-in. It supports a flexible approach where users can create aliases or build custom chat interfaces to streamline interactions with the agent. As an open-source project under the MIT license, SAA invites community contributions and improvements. Keywords: #phi4, AGENTSmd, Alias, Autonomous Agent, Bash, Bubblewrap, CLI Tools, Chat UI, Chatbot, Configuration, Docker, Ecosystems, Gemini CLI, Go Binary, Installation, LLMs, License, MCP, MIT, OpenAI API Key, Plan Mode, SAA, Sandbox, Session Management, Shell, Shopping Automation, Skills, Sub-agents, Teams, Usage
    The google logo   github.com 10 days ago
1951.  HN Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically
The "Agents" CLI tool streamlines the management of multiple configuration files required for various AI coding assistants such as Codex, Claude, Cursor, and Gemini by centralizing MCP (Model Context Protocol) server configurations into a single source of truth located in `.agents/`. This approach simplifies adding or updating servers across different tools. Key features include a convention-over-configuration design with sensible defaults, a security-first architecture that isolates secrets in a gitignored `local.json`, and an interactive setup wizard to facilitate user onboarding. The tool is rigorously tested with over 70 tests using Vitest. It supports AI coding assistants like Codex, Claude Code, Gemini CLI, Cursor, Copilot, and Antigravity, and can be installed via npm as `@agents-dev/cli` under the MIT license. The quick start process involves installing the CLI tool, initializing it within a project folder, and using commands such as `agents sync` to manage configurations. Users can perform various operations including adding MCP servers, listing them, checking for configuration issues, and auto-syncing changes. The tool enhances existing documentation by offering machine-readable configurations while maintaining human-readable instructions through an `AGENTS.md` file. Community support is available on GitHub where users can report bugs, engage in discussions, and provide feedback about the project. Keywords: #phi4, AGENTSmd, AI coding assistants, API keys, Antigravity, CLI, Claude, Codex, Copilot, Cursor, Gemini, GitHub, MCP, agents folder, agentsjson, bug report, command cheat sheet, configuration, discussion, localjson, multi-LLM development, npm, secrets, skills workflows, star on GitHub, sync, tools
    The google logo   github.com 11 days ago
1978.  HN Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers
SafeClaw is a sophisticated management tool designed to handle multiple instances of Claude Code running in isolated Docker containers, ensuring both security and efficiency. It offers an easy setup with sensible defaults and includes a web dashboard that simplifies session management. Each instance operates independently within its own container, providing isolation from the host machine and enhancing security by preventing unauthorized access. Key features of SafeClaw include isolation, allowing each Claude Code instance to run without affecting the host system; lightweight operations for quick spin-up, stop, or deletion of sessions, which is faster than using full virtual machines; portability across any Docker-supported machine for consistent environments; and robust session management that supports multiple parallel research tasks or projects with automatic conversation history storage. The setup process involves building a Docker image and starting containers through scripts. The web dashboard aids in creating, managing, and viewing sessions live. Optional integrations such as Gemini CLI and Slack read access are available to enhance functionality. SafeClaw includes components like Ubuntu 24.04, Node.js 24 (LTS), Claude Code 2.1.32, GitHub CLI, Playwright MCP with Chromium, among others. It securely manages authentication tokens and allows customization of environment variables through scripts. Additionally, the tool provides useful command-line operation aliases within containers, streamlining user interaction and workflow management. Keywords: #phi4, CLI, Chromium, DX plugin, Docker, Gemini, GitHub CLI, Nodejs, Playwright MCP, SafeClaw, Slack, Ubuntu, aliases, authentication, containers, conversation history, dashboard, environment variables, scripts, tmux, ttyd, volume mounts, web terminal
    The google logo   github.com 11 days ago
2030.  HN Show HN: MCP-baepsae – MCP server for iOS Simulator automation
MCP-baepsae is an iOS Simulator automation server tailored for testing iOS applications, particularly beneficial for AI coding agents. It utilizes XCTest private APIs to parse accessibility trees and employs a native Swift bridge to enhance UI operations without the overhead of simctl. The project supports 32 tools designed to meet diverse UI automation requirements across both iOS Simulators and macOS apps. Key features include native Swift integration for improved performance, a comprehensive toolset for various platforms, and a TypeScript MCP layer that facilitates server functionality. Installation can be achieved through npm or directly from the source, with an installer script available to streamline setup on multiple clients. The project necessitates macOS 14+, Xcode + iOS Simulator, Node.js 18+, and Swift 6+, along with accessibility permissions for UI automation features. It supports different runtime environments such as node, npx, bunx, and global, offering manual setup options if the installer script is not utilized. The project's structure includes TypeScript code, native binary output, and test scripts. MCP-baepsae provides end-to-end implementations of 32 tools categorized by platform: iOS Simulator only, macOS only, cross-platform, and utility tools. Usage examples illustrate how to open URLs in the simulator, manage apps, and automate macOS applications. For troubleshooting or architectural discussions, users are encouraged to contact the author. Additional documentation is available in Korean (README-KR.md). Keywords: #phi4, CLI tools, MCP-baepsae, Swift bridge, TypeScript, UI operations, XCTest, accessibility tree, automation, iOS Simulator, macOS app, native binary, simctl, troubleshooting
    The google logo   github.com 11 days ago
2067.  HN Automated AI research setup (Clawdbot/OpenClaw and vibecoding)
The author engineered a lightweight, AI‑driven research pipeline by integrating the OpenClaw/Clawdbot system with “vibecoding” (a JSON‑based scheduler) and Gemini CLI, enabling experimentation to run on modest hardware such as a Raspberry Pi while offloading heavy compute to cloud coding agents (e.g., Jules); the workflow is triggered via Telegram, where the bot can automatically generate, merge PRs, launch jobs in tmux, and queue them on a mini‑cluster, all while recording intent, reproducibility, and results in a SQLite experiment notebook that supports logs, suggestions, and multi‑machine commands, and is augmented by a Tailscale‑hosted dashboard that tracks cluster status and job history and will soon incorporate utilization metrics; initially the author relied on ad‑hoc SSH/rsync/ tmux scripts that suffered from messy environments and lack of queuing, but by building custom tooling they achieved self‑repair (e.g., recovering wiped lists from git history), idle‑machine exploitation via cron‑driven prompts, and autonomous experiment generation when no human input arrives, all within a low‑cost, throwaway setup aimed at quickly filtering hypotheses rather than producing polished releases, a strategy underscored by the author’s emphasis on system design over code, willingness to accept buggy or false results, and gratitude to a grandfather for the original Raspberry Pi that inspired the project. Keywords: #gpt-oss:20b, Clawdbot, Gemini CLI, JAX, JSON, RL, Raspberry Pi, VPS, compute cluster, jobs, queue, rsync, scheduler, ssh, tmux, vibecoding
    The google logo   jessesilverberg.com 12 days ago
2108.  HN Show HN: Open-source UI components and widgets to build MCP apps for ChatGPT
Show HN presents an open‑source UI component framework—mcp‑ui‑starter—designed to build Model Context Protocol (MCP) applications that can interface with ChatGPT, Claude, Gemini, and other AI clients; the guide walks through cloning the repository, installing dependencies, launching a local development server that serves an MCP endpoint at `/mcp` along with Flowbite‑powered widgets, and exposing this local server publicly via ngrok (e.g., `ngrok http 3000` to obtain a URL like `https://<id>.ngrok-free.app/mcp`), which is then added to AI platforms by configuring connectors in ChatGPT’s Developer mode, adding a custom connector in Claude’s settings, or running CLI commands such as `gemini mcp add --transport http <name> "<ngrok‑url>/mcp"`, with analogous commands for Cursor, VS Code, Claude Code, Mistral AI, Codex, and other tools; once registered, each platform can discover and use the MCP server’s tools. The guide also explains how to create a new widget by adding a server‑side component that exports a Zod‑validated configuration (e.g., a “basic‑text” widget returning “Hello, world!”) and a corresponding front‑end React component that renders the widget’s output, then registering the widget with the server via `.registerWidget()`; additionally, Flowbite UI components can be themed by importing one of the built‑in CSS files (Default, Minimal, Enterprise, Playful, Mono) or by customizing Tailwind CSS variables in `index.css`. Keywords: #gpt-oss:20b, AI, Bun, ChatGPT, Flowbite, MCP, NGROK, NPM, Open-source, PNPM, SDK, Skybridge, UI components, Yarn, widgets
    The google logo   flowbite.com 12 days ago
2145.  HN Gemini CLI v0.27.0
Gemini CLI v0.27.0 indicates that JavaScript is disabled in the current browser, which prevents the use of x.com, and it advises users to enable JavaScript or switch to a supported browser, directing them to the Help Center for a list of compatible browsers. Keywords: #gpt-oss:20b, Gemini CLI, Help Center, JavaScript, browser, continue, disabled, enable, list, supported browsers, switch, v0270, xcom
    The google logo   twitter.com 12 days ago
2147.  HN Show HN: Agent-smith – Auto-generate AGENTS.md for AI coding assistants
Agent‑smith is a zero‑config TypeScript CLI (`npx @jpoindexter/agent-smith`) that scans a JavaScript/TypeScript codebase to automatically produce an `AGENTS.md` file, a structured context document used by AI coding assistants to understand project details without manual configuration; it extracts metadata such as component props, complexity, client‑only hooks, API routes with auth status, database models and relations, design tokens, and import graphs, and also generates “critical rules” with wrong/right code examples to enforce consistent patterns, yielding roughly 10 k tokens of concise, structured context versus 100 k+ raw code tokens; the tool supports multiple output modes (default, compact, compress, minimal, XML, tree), numerous flags for customizing output, dry‑run preview, clipboard copying, inclusion of diffs or git logs, splitting large repos, security checks, monorepo support, and a built‑in MCP server exposing `pack_codebase`, `read_agents`, `search_components`, and `get_component_info` actions for AI assistants, and can be run directly with `npx @jpoindexter/agent-smith`, globally installed, or with specified directory paths, with the project hosted on GitHub at https://github.com/jpoindexter/agentsmith. Keywords: #gpt-oss:20b, AGENTSmd, AI, API routes, Agent-smith, CLI, JSDoc, JSON, Nextjs, Prisma, React components, Remote, Tailwind, TypeScript, Zustand, codebase, components, hooks, shadcn/ui, tRPC
    The google logo   github.com 12 days ago
2254.  HN Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers
No summary available (error)
       github.com 13 days ago
2290.  HN State of Flutter 2026
Flutter 2026 shifts the rendering engine to default Impeller on Android (API‑29+) and drops Skia support on iOS while retaining Skia for the web; Impeller delivers a 30–50 % reduction in shader‑jank, 20–40 % better text rendering, and drops frame rates from 12 % to about 1.5 %, marking its biggest performance win yet. Developers should migrate immediately, audit widget dependencies for the forthcoming Material/Cupertino split in Q2 2026, and integrate the new v1.0 Flutter AI Toolkit—chat, multi‑turn function calls, speech‑to‑text—and GenUI SDK alpha for LLM‑driven UIs. Benchmarking shows Flutter’s memory usage (25 MB iOS, 14 MB Android) lies between native (smaller) and React Native (larger) systems, while Avalonia’s partnership with Google enables Impeller integration for .NET, delivering power savings and faster starts via Vulkan‑based graphics context with graceful fallbacks to OpenGL ES or Skia. In 2024‑25 the “Flock” fork by former core dev Matt Carroll amplified community frustration over sluggish desktop support and slow PR triage, prompting the official team to accelerate backlog remediation. The professional AI roadmap unveiled at Google I/O 2025 positions Flutter as the platform for “agentic apps” where an LLM dictates UI state, supported by tools such as the Flutter Extension for Gemini CLI, the Dart & Flutter MCP Server, Antigravity’s experimental IDE layer, and Firebase AI Logic SDK. IDE integration extends to Android Studio Meerkat’s Gemini code completion and VS Code/IntelliJ Gemini Assist, while a multimodal Flutter AI Playground showcases text, image, and chat prototypes. The release cadence targets Flutter 3.41 with Dart 3.11 in February 2026, a mid‑2026 release of Flutter 4.0 contingent on core design decoupling, and a schedule of four stable and twelve beta releases sans in‑flight code‑push. Key enhancements include a smaller core, deeper Material Design 3 integration, native‑like desktop UI support, modular app sizes, and further Impeller optimisations. Upcoming priorities encompass migrating the web engine to WebAssembly by H1 2026, standardising Swift Package Manager for iOS plugins, preparing a 10‑foot TV‑optimized layout for LG WebOS in H1 2026, and aligning with new OS releases (iOS 20, Android 17 “Cinnamon Bun”) while embracing fold‑screen and advanced accessibility. The community calendar highlights October 7‑9 2026 as the Next.App DevCon in Berlin for foldable and multi‑window testing and for validating emerging Impeller desktop preview flags. Other milestones involve GenUI’s beta transition, Antigravity IDE preview, the Model Context Protocol for unified IDE/CI AI communication, a refreshed DevTools “Inspector 2.0”, and ongoing build‑time and startup optimisations. Finally, the ecosystem remains attentive to a potential Flutter Foundation for governance, growing IoT interest on Arduino/Raspberry Pi, and a proliferating favorites ecosystem now requiring 2FA, with packages ranging from Rust wrappers to advanced charting libraries—prompting proactive audits and dry‑runs today to stay ahead of the 2026 migration window. Keywords: #gpt-oss:20b-cloud, AI, AOT, Android, Cupertino, Flutter, Impeller, LLMs, Material, Skia, Vulkan, iOS, shader compilation
    The google logo   devnewsletter.com 13 days ago
2313.  HN Show HN: AgentGuard – Open-source security layer for AI agents and skills
AgentGuard is a free, open‑source real‑time security layer for AI agents that blocks malicious skills and prompt‑injection attacks by intercepting dangerous file, terminal, and network operations. Its Layer 1 automatically forbids destructive commands (e.g., `rm -rf /`, fork bombs), protects critical files (.env, .ssh/), blocks data exfiltration to webhooks, and logs the initiating skill. Layer 2 provides on‑demand static analysis of new skills using 24 detection rules that cover secrets, backdoors, obfuscation, prompt injection, and a wide range of Web3 exploits (wallet draining, unlimited approvals, reentrancy, flash‑loan risk, etc.), and it also supplies a trust registry for capability‑based access control. The tool ships with a straightforward npm install or git‑clone setup, offers CLI commands such as `/agentguard scan`, `/agentguard action`, `/agentguard trust list`, `/agentguard report`, and `/agentguard config` to adjust protection levels—strict, balanced, or permissive—and is compatible with Claude Code (via pre/post tool use hooks), OpenAI Codex, Gemini, Cursor, and GitHub Copilot. A recent scan of the example “vulnerable‑skill” repository demonstrates a critical risk level with hits across JavaScript, Solidity, and Markdown, while the upcoming 1.1 release will add Trojan‑skewed `SKILL.md` detection, Markdown scanning, and base‑64 payload decoding. Version 3.0 introduces Markdown capability scanning, an open‑source plugin manifest, a federated trust registry, shared C2 domain/IP blocklists, automated marketplace checks, a VS Code extension, and community rule contributions, all licensed under MIT. Keywords: #gpt-oss:20b-cloud, AI agents, AgentGuard, Deep Scan, Web3, backdoor, credentials, exfiltration, malicious skill, open-source, prompt injection, reentrancy, security layer, wallet draining, webhook
    The google logo   github.com 13 days ago
2326.  HN Show HN: Remote AI coding without moving your code – CloudForge
CloudForge is a web‑based UI that lets users run popular AI coding tools—Claude Code, Codex CLI, Aider, Gemini CLI—directly on their own servers without transferring code off‑premises. By connecting a lightweight, forthcoming open‑source agent, the platform supplies a web terminal via xterm.js and embeds the Monaco editor, removing the need for SSH port forwarding. A free tier supports one Bring‑Your‑Own‑Server (BYOS) instance, and the service includes AI‑auth management for API keys, with one‑click deployment available through its website. Keywords: #gpt-oss:20b-cloud, AI Auth, API keys, Claude Code, CloudForge, Codex CLI, Gemini CLI, Monaco, Remote AI, SSH, Show HN, web UI, xtermjs
    The google logo   cloud-forge.me 13 days ago
2380.  HN Show HN: Toktrack – 1000x faster AI CLI cost tracker (Rust and SIMD)
Toktrack is a Rust‑based, SIMD‑optimized command‑line application that aggregates token usage and cost across Claude Code, Codex CLI, and Gemini CLI, solving the slow throughput of existing tools (over 40 s for 3 GB of JSON logs), data loss caused by Claude Code’s 30‑day session purge, and fragmented logs across multiple interfaces. By leveraging simd‑json and Rayon, it parses up to ~3 GiB/s, yielding a first run in ~1 s and cached queries in ~0.04 s—up to 1000× faster than baselines—while persisting immutable daily summaries in a ~/.toktrack/cache directory that outlasts CLI data deletions. A text‑UI dashboard with four tabs (Overview, Models, Daily, Stats) offers daily, weekly, and monthly breakdowns, and the same command set (e.g., daily, monthly, stats, help) works uniformly across supported CLIs; machine‑readable JSON output is obtainable with a `--json` flag. Installation is straightforward via `npx toktrack` (auto‑downloaded binary) or `cargo install --git https://github.com/mag123c/toktrack`, and prebuilt binaries exist for macOS, Linux, and Windows. Typical usage includes launching the dashboard with `npx toktrack`, querying today’s cost with `npx toktrack daily --json`, or obtaining a monthly summary with `npx toktrack monthly --json`. Navigation uses Tab/Shift+Tab, j/k, with `q` to quit and `?` for help. The cache structure houses per‑CLI daily JSONs and a pricing.json with a 1‑hour TTL; the cold path builds the cache from all files, while the warm path updates only modified files from the last 24 h. By caching immutable summaries, Toktrack preserves usage history against retention policies such as Claude Code’s 30‑day cleanup and Codex CLI’s size caps. Future roadmap includes OpenCode support, with contributions encouraged under the MIT license. Keywords: #gpt-oss:20b-cloud, AI CLI, Claude Code, Codex CLI, Gemini CLI, Rust, SIMD, TUI, Toktrack, benchmarks, cost history, cost summaries, cost tracker, dashboard, parallel, performance, persistent cache, pricing, processing, rayon, simd-json, throughput, token usage
    The google logo   github.com 13 days ago
2408.  HN Evolve SDK – Open-Source Manus Powered by Claude Code, Codex CLI, Gemini CLI
The speaker indicates their willingness to assist in preparing a concise summary and requests the recipient to provide the specific email address that should be included. Keywords: #gpt-oss:20b-cloud, Claude Code, Codex CLI, Evolve SDK, Gemini CLI, Manus, Open-Source, address, contacted, email, feedback, input
    The google logo   github.com 14 days ago
   https://github.com/evolving-machines-lab/manus-evolve   14 days ago
   https://github.com/evolving-machines-lab/evolve   14 days ago
2413.  HN Choosing Antigravity or Gemini CLI
The Antigravity IDE is a full‑featured agent manager designed for users who value a graphical workflow, offering an offline GUI installation with no prerequisites, centralized agent orchestration through a dashboard, a strongly opinionated spec‑driven development style complete with live walkthroughs, and native debugging capabilities; it also supports extensibility via VSX extensions, the MCP, and Agent Skills, all integrated into a single interface that not only hosts an embedded browser but gives visual feedback and debugging hooks. Conversely, Gemini CLI excels for lightweight, headless, or script‑driven scenarios such as CI/CD pipelines or terminal‑based automation, requiring Node.js installation via `npm install -g @google/gemini-cli`, executing commands in separate terminals or tmux sessions, supporting a configurable approach with extensions and Agent Skills, and functioning either with direct tool calls (e.g., GitHub, gcloud) or a headless mode that outputs to the console. Both tools are mature, free to try, and can coexist within a workflow; the choice hinges on whether a user prefers an IDE‑style visual environment for orchestrating multiple agents or a purely command‑line, automation‑friendly approach for scriptable, rapid deployment. Keywords: #gpt-oss:20b-cloud, Antigravity, CI/CD, Gemini CLI, IDE, Nodejs, Open VSX, agent manager, agent skills, free tier, headless mode, installation, multiple agents, npm, terminal
    The google logo   cloud.google.com 14 days ago