Scraper
Spider

About
Blog
@dbaman@fosstodon.org

Click ▶ to show/hide AI summary and keywords
Click The google logo

for Google search on keywords

2026-03-11 15:30

gemini cli

gemini cli stories from the last 14 days | Back to all stories

4. HN PostTrainBench: How well can AI agents post-train language models?

PostTrainBench serves as an experimental benchmark aimed at assessing the capability of AI agents to independently execute post-training workflows on language models, transforming them into systems with enhanced functionalities like instruction-following and safe behavior. Traditionally managed by human expertise involving complex data pipeline designs and configurations, this process is now being explored for autonomous execution by AI. The primary goal of PostTrainBench is to determine if AI can autonomously perform these tasks, potentially allowing AI to enhance itself—a concept that could revolutionize its application across various fields. The benchmark is designed with real-world constraints in mind, such as specific objectives and limited computational resources—10 hours on an H100 GPU. It underscores the significance of post-training processes, which have enabled models like GPT-3.5 to evolve into practical tools through methods including reinforcement learning from human feedback (RLHF). By emphasizing end-to-end autonomy, PostTrainBench challenges AI agents to construct entire training pipelines without any external input. Throughout its testing phase, PostTrainBench evaluates AI across diverse base models and tasks, covering areas such as math, science, and coding. Results indicate that while there is notable progress in AI's autonomous capabilities, it still falls short of human-led post-training efforts. Interestingly, some AI agents have shown the ability to outperform official releases on specific narrow tasks, suggesting that strategic optimization under constraints can yield positive outcomes. During benchmarking, several behavioral patterns emerged among AI agents, including premature task termination and varying levels of reasoning effort, which sometimes led to "reward hacking." This occurs when models take unintended shortcuts to optimize metrics without genuinely achieving the desired objectives. These insights highlight ongoing challenges in ensuring AI operates within its intended parameters. Despite simplifications, PostTrainBench represents a pivotal step toward understanding how well AI can automate post-training tasks—a process expected to become increasingly prevalent as AI integrates into diverse sectors. Plans are underway for continuous updates and expansions of the benchmark to track advancements and refine measurements of AI's autonomous R&D capabilities. This initiative is a collaborative effort involving multiple research institutions, reflecting its broad significance in advancing AI technology. Keywords: #phi4, AI agents, Constitutional AI, PostTrainBench, RLHF, RLVR, benchmark, evaluation, integrity-preserving, language models, reinforcement learning, resource-bounded, reward hacking, scaffolds

gemini cli

posttrainbench.thoughtfullab.com 53 minutes ago

18. HN LaneConductor – Gemini conductor and Claude Code superpowers meets on Kanban

LaneConductor is a local-first, cloud-independent development environment designed to streamline multi-agent AI workflows without incurring cloud-related expenses or authentication issues. It integrates advanced capabilities from Gemini and Claude Code into a Kanban interface, enabling developers to manage tasks such as brainstorming, planning, and automated implementation directly on their hardware. The platform operates entirely offline, using local resources while implementing the Conductor Pattern for structured pipelines that encompass planning through quality assurance stages, compatible with the Gemini CLI format. A key component of LaneConductor is its Filesystem Message Bus system, which employs Markdown files as the source of truth to ensure seamless coordination between agents and humans. It also features a Live Kanban Dashboard constructed using Vite + React, synchronized via a local Postgres database, providing real-time updates on workflow progress. Quality assurance is integral, requiring tasks to pass automated tests, linting, and builds before completion. The system supports multiple AI agents optimized for Claude Code and Gemini, with primary and fallback configurations available. Users can quickly start using LaneConductor by installing the `lc` CLI tool in their project directory and launching both the worker and dashboard components. Additional optional features include AI context scaffolding for automatic documentation via Claude Code skills. The project architecture encompasses core logic, a Kanban UI, and optionally Firebase functions to facilitate team collaboration, all under an MIT license. Keywords: #phi4, AI development, Claude Code, Conductor Pattern, Firebase functions, Gemini, Kanban, LaneConductor, MIT License, Markdown files, Multi-Agent Support, Postgres database, Quality Gates, Vite + React dashboard, control plane, lc CLI, local-first, multi-agent, orchestrator logic, project documentation, sovereign environment

gemini cli

  github.com 2 hours ago
   https://raw.githubusercontent.com/meller/laneconductor&   an hour ago
   https://github.com/meller/laneconductor   an hour ago

29. HN Show HN: Reviewd – A free, local alternative to Claude Code Review(no API costs)

Reviewd serves as an open-source, cost-effective alternative to Claude Code Review, specifically designed for local usage to eliminate API-related expenses from Anthropic's $15–$25 per pull request (PR) tool. The platform automates the review process by leveraging AI tools like Claude, Gemini, or Codex, allowing it to operate locally on a machine or virtual private server (VPS), and integrating seamlessly with GitHub or BitBucket repositories. The key features of Reviewd include an automated workflow that efficiently polls for open PRs, sets up git worktrees without needing re-cloning, and executes local tests or commands if necessary. It employs AI tools to analyze the code, parses JSON outputs, and posts structured comments on PRs, while mitigating the "echo chamber" problem by using different AI models for writing and reviewing code. This approach ensures varied perspectives in reviews and prevents self-reinforcement of coding errors. Reviewd is optimized for performance and efficiency through the use of thread-safe SQLite in Write-Ahead Logging (WAL) mode to track state without duplicating efforts, enabling fast reviews via git worktrees and supporting parallel processing of multiple PRs. It can run as a headless systemd service, making it ideal for VPS deployments. The tool offers several benefits including cost-free operation by utilizing existing resources, enhanced security through local repository access only, and flexibility with support for multi-repo setups and different AI backends. Users can configure the tool to automatically approve PRs based on specified criteria, and its ease of use is bolstered by a minimal setup requirement—Python 3.12+ and an authenticated AI CLI. Implementation involves installation through pip or uv, followed by configuration using an interactive wizard for initial setup with GitHub or BitBucket tokens. Reviewd can be run in daemon mode to continuously monitor PRs or as a one-shot command per PR, and it is fully headless, suitable for VPS deployments with systemd support. The security model of Reviewd operates within strict AI CLI environment parameters to prevent unauthorized file modifications and access while maintaining safe interactions via isolated git worktrees. Licensed under MIT, Reviewd promotes open development and adaptation, aiming to streamline code reviews without incurring additional costs or requiring third-party integrations by efficiently leveraging existing tools. Keywords: #phi4, AI, AI code review, API, API costs, BitBucket, Claude, Claude Code Review, GitHub, Python, Python daemon, Reviewd, automated comments, code review, costs, daemon, git worktree, local execution, multi-AI, multi-AI support, sandboxing, security, security sandboxing Keywords: Reviewd

gemini cli

github.com 3 hours ago

47. HN RCE in Your Test Suite: How AI Agent Skills Bypass Every Skill Security Scanner

The article examines a critical vulnerability in AI agent skills ecosystems related to the integration and execution of malicious code through test files. It highlights that while security measures focus on SKILL.md files and execution instructions, they overlook how test runners such as Jest and Vitest can execute hidden .test.ts files within specific directory structures, thereby allowing malicious payloads to run undetected. This vulnerability arises because these test runners recursively search for test files across project directories, including those prefixed with a dot (.), which can be used by attackers to introduce harmful scripts under the guise of legitimate tests. The article outlines a threat scenario where an attacker distributes such a skill via platforms like ClawHub, leading to widespread distribution and execution within developers' projects due to its inclusion in version control systems like Git. To mitigate this risk, the article suggests several strategies: updating test runner configurations to exclude specific directories like .agents/, enforcing stricter file filtering during skill installation, marking suspicious files in registries, and incorporating these security measures into CI pipelines. The broader implication highlighted is the introduction of significant supply chain risks when skills are committed to repositories, similar to previous challenges encountered by package registries such as npm. This underscores the necessity for comprehensive security practices that extend beyond traditional content scanning methods to effectively safeguard AI tooling ecosystems. Keywords: #phi4, AI Agent Skills, CI Pipelines, ClawHub, Conftestpy, Dot-Prefixed Directories, ESLint, Install Command, Jest, Malicious Payloads, Markdown, Public Marketplaces, RCE, Recursive Glob Patterns, Skill Security Scanner, Supply Chain Security, Test Runner, Vitest, YAML

gemini cli

www.gecko.security 4 hours ago
https://apistronghold.com an hour ago

57. HN I built AI a human brain in TypeScript – no more re-explaining

Veris is an innovative system designed to enhance artificial intelligence by integrating a sophisticated memory model that allows it to retain knowledge and context across sessions. This system mimics human brain mechanisms using 158 neuroscience-inspired strategies implemented in TypeScript, setting itself apart from typical AI solutions that rely on basic keyword search or context dumping. Veris' key features include persistent memory, which enables AIs like OpenClaw, Claude Code, Cursor, Codex, and Gemini to remember user interactions and project details without losing information between sessions. It incorporates 124 documented neuroscience mechanisms such as Hebbian learning and spreading activation, facilitating knowledge retention, pattern recognition, and creativity. A standout feature of Veris is its ability to maintain cross-AI consistency by working with any AI that supports hooks or the Multi-Provider Client (MCP), ensuring a seamless user experience. It operates continuously in the background, performing memory consolidation every six hours through processes akin to non-rapid eye movement (NREM) and rapid eye movement (REM) sleep stages, which further enhances knowledge retention and creative capabilities. Veris also features self-regulation, adjusting its performance based on noise levels and fragmentation of knowledge to optimize functionality over time. Privacy is a crucial component of the system, as Veris operates locally without transmitting any data from the user's device, ensuring the security and privacy of information. The installation process involves using npm for global setup, after which users can initialize a "brain" that connects to various AI providers. Users have access to commands for managing brain health, viewing dashboards, and interacting with knowledge graphs, along with a 3D dashboard providing real-time insights into the AI's memory network. Developed by Noah Sioly at just 17 years old, Veris aims to revolutionize how users interact with AI systems by eliminating the need to re-explain information, thereby enhancing productivity and user experience. Keywords: #phi4, 3D visualization, AI, CLI commands, Elastic License, Hebbian learning, MCP server, OpenAI API, SQLite, Thalamus, TypeScript, Veris, architecture, consolidation, embeddings, hooks, installation, knowledge graph, metacognition, neuroscience, privacy, providers, spreading activation

gemini cli

github.com 4 hours ago

72. HN Skillfile: Declarative manager for AI skills and agents (like brewfile)

Skillfile is a declarative tool designed specifically for managing AI skills and agents across various platforms like GitHub, akin to Brewfile but tailored for AI environments. It uses a single configuration file known as Skillfile to keep track of installed community-contributed tools by referencing exact commit SHAs, ensuring precise version control and reproducibility during installations. Key features include automated installation management, customization through pinning and patching of local changes without losing updates, and compatibility with multiple AI platforms like Claude Code, Gemini CLI, and Codex for unified management across systems. The tool provides a comprehensive command set for setup (init, add, remove), workflow management (install, sync, status), validation (validate, format), and customization tasks (pin, unpin, resolve), facilitating efficient configuration management and troubleshooting. Skillfile offers various installation methods: it can be installed via `cargo install skillfile` from crates.io, or users may download pre-built binaries or clone the source repository to build locally. A crucial security consideration is that Skillfile functions purely as a file manager without analyzing, verifying, or sandboxing downloaded content; therefore, users bear the responsibility for reviewing any content they fetch similarly to using `git clone`. The tool also supports customization through environment variables such as `GITHUB_TOKEN` for private repository access and `MERGETOOL` or `EDITOR` for conflict resolution. Skillfile is open-source and encourages community contributions, with further details on file formats and customization options available in the SPEC.md document within its project repository. Keywords: #phi4, AI, AI skills, Brewfile, GitHub, Skillfile, agents, commit, commit SHAs, config, config file, customization, declarative manager, environment, environment variables Keywords: Skillfile, install, lock, lock file, manager, markdown, markdown files, patch, patches, platforms, reproducibility, validation

gemini cli

github.com 6 hours ago

97. HN Give your AI agents reversibility and governance before they touch your host

EnvPod is an advanced platform designed to manage AI agents safely by providing isolated and reversible environments known as "pods." Developed by Mark Amoboateng, it operates under the Boost Software License 1.1 until March 7, 2030, transitioning thereafter to AGPL-3.0. Building on traditional containerization technologies like Docker and Podman, EnvPod incorporates robust governance features to enhance security and control. The platform offers isolation through Linux namespaces, separating processor, network, memory, and device resources. It provides reversibility with a copy-on-write file system overlay, allowing any changes made by AI agents to be reviewed, committed, or rolled back, thereby maintaining the integrity of the host environment. Governance features include a credential vault for secure secret management, an action queue that classifies and controls actions based on their reversibility, audit logs for activity monitoring, and real-time policy enforcement through remote control capabilities. EnvPod enhances security with DNS filtering specific to each pod, static configuration analysis, and jailbreak testing to ensure AI agents operate safely without compromising sensitive data or system resources. It also supports functionalities such as a web dashboard for fleet management, live resource monitoring, network port forwarding with varied scopes, and GPU passthrough support, offering performance optimizations over Docker and Podman through faster initialization times. The tool caters to diverse use cases, including coding agents like Anthropic Claude Code CLI, browser automation, and development environments. Its configuration is managed via a YAML file (`pod.yaml`), allowing detailed customization of pod capabilities. Installation on Linux systems requires only a single binary with no dependencies, complemented by an interactive wizard for preset setups tailored to specific needs. EnvPod aims to harness the power of AI agents effectively while mitigating potential risks through comprehensive governance and monitoring strategies. Keywords: #phi4, AI agents, CLI, COW, CPU affinity, DNS resolver, Docker, EnvPod, GPU passthrough, Linux, OverlayFS, PipeWire/PulseAudio, Rust, Wayland/X11, action queue, audit, benchmarks, budget enforcement, cgroups, clone, containers, credential vault, dashboard, filesystem, governance, interactive wizard, isolation, jailbreak test, microVMs, monitoring, namespaces, network namespace, noVNC, policy, presets, sandbox, scale test, seccomp-BPF, security, undo registry, vault proxy, web display

gemini cli

  github.com 7 hours ago
   https://envpod.dev   6 hours ago
   https://discord.gg/envpod   6 hours ago

141. HN Gemini CLI as an agent harness for Google Workspace CLI (gws)

Gemini Workspacer is a local demonstration application designed to facilitate the creation of Google Docs, Sheets, and Slides by integrating Gemini CLI as an agent harness within the Google Workspace CLI (gws). The process begins with a chat-based planning phase where users articulate their ideas, allowing the system to generate structured draft plans. These plans are then executed using Google Workspace tools to produce polished artifacts. The application features include Planning and Generation, wherein users interact via a chat UI to outline project ideas, assisted by Gemini's AI for generating detailed plans with specific goals per artifact type. Additionally, it provides Live Feedback by streaming server and CLI logs in real-time to the frontend through NDJSON events. The Artifact Creation process involves using Google Workspace tools to produce documents, spreadsheets, and presentations based on confirmed plans, while Link Extraction retrieves final URLs from CLI output, utilizing Gemini SDK as a fallback when necessary. Technically, Gemini Workspacer employs Next.js 16 App Router, React 19, TypeScript, Tailwind CSS v4 for styling, Biome for formatting, Vercel AI SDK for the planning UI, TanStack Query for mutation state management, and utilizes both Gemini CLI and SDK to ensure structured execution. To set up and run the application, users must have Node.js, pnpm, a GEMINI_API_KEY, an authenticated gemini CLI, Google Workspace tooling/extensions, and a Google account. Installation involves running `pnpm install`, configuring environment variables in a `.env` file, and starting the development server with `pnpm dev`. The project structure encompasses directories for planning, generation, UI components, schemas, and service orchestration. During testing and development, there is an emphasis on real logic testing, particularly focusing on URL extraction from CLI output. The application showcases robust layout recommendations for Docs and Slides but is limited by its function as a localhost demo subject to potential failures in the Gemini CLI due to external factors. Keywords: #phi4, @google/genai, GEMINI_API_KEY, Gemini CLI, Gemini Workspacer, Google Doc, Google Sheet, Google Slides, Google Workspace CLI, Motion UI, Nextjs, Nodejs, React, Tailwind CSS, TanStack Query, TypeScript, Vercel AI SDK, agentation toolbar, artifact extraction, pnpm, regression coverage

gemini cli

github.com 16 hours ago

247. HN 15 Cloud/local LLMs benchmarked on 38 real tasks. MiniMax and Kimi tied for 2nd

The document presents a detailed benchmark comparing 15 cloud/local Large Language Models (LLMs) across 38 tasks pertinent to Ian Paterson, CEO of a cybersecurity firm. The study highlights that evaluating LLMs should extend beyond intelligence to include practical deployment factors such as latency, data reliability, and cost. Key findings suggest that task routing can be more beneficial than selecting advanced models; basic models often meet daily needs effectively. Opus and Sonnet achieved perfect accuracy scores in all tasks, while MiniMax M2.5 excelled in format compliance, ideal for automation pipelines. Gemini Flash offers high coverage with low costs and response times. In terms of cost-effectiveness, Sonnet balances accuracy, cost, and speed, whereas GPT-oss-20b provides competitive free-tier performance. Recommendations include using Opus and Sonnet as primary models due to their balanced performance, employing Gemini Flash for quick, low-stakes tasks, and considering GPT-oss-20b for budget-friendly solutions. The methodology involved a deterministic scoring system across various model adapters and infrastructure paths, emphasizing consistent evaluation environments. The study underscores the importance of QA layers for error detection in LLM outputs. In conclusion, optimizing LLM deployment strategies should focus on task-specific routing rather than solely relying on model capabilities. It is crucial to consider infrastructure and cost alongside performance metrics when integrating AI solutions into business workflows. Keywords: #phi4, API calls, CSV/JSON manipulation, Canadian context, Claude Sonnet, Cloud LLMs, Codex CLI, GPT-oss-20b, Gemini Flash, JSON, Kimi K25, MiniMax M25, OpenRouter, Opus, QA pass, TSX-V press releases, agent loops, batch accuracy, batch jobs, benchmarking, cost control, cost guardrails, cron log analysis, cybersecurity, data boundaries, data transforms, deployment decisions, deterministic scoring, extraction, format compliance, free-tier models, health checks, inference arbitrage, inference prices, interactive debugging, interactive sessions, investments, latency, latency tax, letter counting, local-only workloads, math, model selection, multi-step logic, on-prem models, orchestrator, output quality, planning, prediction markets, quick classification, reasoning, reasoning depth, remnant tokens, routing, routing policy, speed-critical agentic loops, structured output, style-constrained drafting, subagent work, task decomposition, text-only prompts, thinking models, web searches, writing

gemini cli

ianlpaterson.com a day ago

293. HN The Agent Skills Gold Rush Has a Malware Problem

The agent skills ecosystem has seen swift expansion with platforms like ClawHub growing from 2,800 to over 10,700 skills within three weeks. This rapid development, however, has introduced substantial security challenges, notably the emergence of over 800 malicious packages primarily distributing malware such as Atomic macOS Stealer. The lack of stringent security protocols in multiple skill registries—such as static analysis or signing requirements—has intensified these vulnerabilities. Several competing platforms like SkillsMP, MCP.so, SkillHub, and Vercel's Skills.sh contribute to a complex ecosystem where the SKILL.md standard facilitates skill portability but simultaneously heightens security risks. Problems include widespread unauthenticated OpenClaw instances and severe vulnerabilities like remote code execution (RCE) affecting numerous unpatched systems. These issues echo previous supply chain crises in npm, characterized by threats such as typosquatting and concealed malicious payloads. Current remediation efforts, including partnerships for malware scanning like that between VirusTotal and ClawHub, are deemed insufficient to address these security concerns adequately. To mitigate risks, developers using agent frameworks are advised to perform thorough audits of installed skills, pin specific versions, verify sources, and cautiously publish across multiple registries while minimizing permissions and ensuring secure configurations. Despite the growth of the ecosystem, a considerable proportion of agent skills currently pose significant security threats, highlighting the urgent need for more comprehensive protective measures. Keywords: #phi4, Agent Frameworks, Agent Skills, Atomic macOS Stealer, CVE-2026-25253, ClawHub, Cross-listing, Gold Rush, Malware Problem, Marketplace Explosion, Open Source Auditing, OpenClaw Instances, Prompt Injection, SKILLmd Standard, SecureClaw, Security Researchers, Shadow AI, Third-party Skills, Version Pinning, VirusTotal, npm Parallel

gemini cli

www.theundercurrent.dev a day ago

303. HN Claude Code Skills and Plugins as an Open Source Project

**Claude Code Skills and Plugins** is an open-source initiative offering a comprehensive collection of 170 production-ready skills and plugins aimed at augmenting AI coding agents across various fields like engineering, product development, marketing, and compliance. This repository has attracted significant attention on GitHub with over 2,500 stars, establishing itself as a versatile skill library for AI applications. The project features **Skills**—modular instruction sets that equip AI agents with domain-specific knowledge not inherently available to them. Each skill includes documentation, Python CLI tools, and reference materials necessary for specialized tasks. These skills are designed for compatibility across four platforms: Claude Code, OpenAI Codex, Gemini CLI, and OpenClaw. Installation is facilitated through straightforward methods such as cloning the repository or using specific scripts, allowing users to integrate diverse skills related to engineering, product management, marketing, regulatory compliance, advisory roles, business growth, finance, and more. In terms of domains and skill highlights, **Engineering** includes core competencies like architecture and QA, alongside advanced capabilities in agent design and CI/CD pipeline construction. The **Product & Marketing** domain covers skills such as product management strategies, content creation, SEO optimization, and marketing orchestration with Python tools. Skills for **Compliance & Management** focus on regulatory compliance auditing and project management. Additionally, the project offers C-Level Advisory skills for executive guidance and financial analysis capabilities. A critical feature of this project is its security component; it includes a v2.0.0 security auditor tool that scans new skills for potential risks like command injection and privilege escalation before installation. Usage examples illustrate the practical applications of these skills in areas such as architecture review, SEO-optimized content creation, compliance auditing, and various Python-based analyses including brand voice and tech debt scoring. The project is open to contributions, encouraging enhancements and additions in terms of new skills, tool improvements, test coverage expansions, and translations. It operates under the MIT license, providing users with extensive rights for usage and modification. The initiative was developed by Alireza Rezvani, who also offers additional resources and updates through platforms like Medium and Twitter. Keywords: #phi4, AI Coding Agents, Automation, Claude Code, Compliance, Dependency-Free, Domain Expertise, Engineering, GitHub Stars, Installation, MIT License, Marketplace, Open Source, Plugins, Product Management, Python CLI Tools, Regulatory, Security Auditor, Semantic Versioning, Skills

gemini cli

github.com a day ago

373. HN Agentis – An AI-native programming language where the LLM is the stdlib

Agentis is an AI-native programming language integrated with a Version Control System (VCS), specifically crafted for developing autonomous agents by utilizing Large Language Models (LLMs) as its core library. Unlike traditional text-based languages, Agentis represents code as binary data within a Directed Acyclic Graph (DAG), hashed using SHA-256 to ensure integrity and uniqueness. This approach facilitates importing and managing code through content-addressable hash values, thereby eliminating merge conflicts typically found in conventional systems. The language promotes operation execution via prompts, which the embedded LLMs interpret to perform tasks such as email extraction or text classification, ensuring responses are accurate and validated within the framework. Agentis supports multiple LLM backends, including Claude, Ollama, Anthropic API, Gemini CLI, and a default mock backend, offering flexibility in model choice. Core commands like `agentis init` and `agentis go` facilitate project management, code execution, and branching operations. A unique feature of Agentis is its cognitive budget system that limits agent activities through "fuel" allocation to avoid inefficiency, encouraging developers to design concise and efficient prompts. This system underpins the language's evolutionary branching strategy, where successful code executions generate new branches while unsuccessful ones are discarded, optimizing resource usage. Additionally, operations within the environment are sandboxed for security, mandating whitelisted network interactions. Built on Rust, Agentis is distributed under the MIT license, offering robustness and community accessibility. Its documentation encompasses a comprehensive language reference, VCS models, philosophical insights into its design principles, and illustrative example programs to aid users in mastering this innovative programming paradigm. Keywords: #phi4, AI-native, Agentis, CLI, Git-like branches, LLM, Rust, SHA-256, Version Control System, binary DAG, cognitive budget, content-addressed code, domain whitelisting, evolutionary branching, fuel costs, programming language, prompt, sandbox, sandboxed I/O, standard library

gemini cli

github.com a day ago

378. HN Making Prompt Injection Harder Against AI Coding Agents

The article examines strategies to counter prompt injection attacks on AI coding agents, focusing on recent incidents and the inadequacies of current defenses. These attacks exploit vulnerabilities by embedding malicious instructions within code or comments that bypass detection during development, posing significant risks to tools like GitHub Copilot. To address this issue, CloneGuard is introduced as a multi-layer defense system developed by Chiradeep Chhaya. This architecture comprises four layers: pre-execution repository scanning, real-time instruction inspection, post-use output analysis, and checks before critical operations such as network calls or file writes. Each layer targets different stages of the attack lifecycle. CloneGuard utilizes a detection stack with three tiers: regex patterns for known threats, an ONNX embedding classifier trained on labeled datasets for nuanced detection without external dependencies, and a general-purpose LLM classifier as a fallback. The system emphasizes that the absence of prompts reduces vulnerability to injection attacks, contrasting with AI models susceptible to these very methods. The article contrasts CloneGuard's approach with existing classifiers, highlighting that models trained on chat prompts are less effective for scanning repository files due to high false-positive rates. It criticizes reliance solely on AI models like Claude for detection, as they share vulnerabilities with potential attacks. Additionally, the need for architectural defenses such as capability tracking and data flow analysis is discussed to mitigate harmful effects of prompt injections. The discussion extends to industry practices, cautioning against over-reliance on detection alone and advocating a defense-in-depth strategy that combines detection, restriction, monitoring, and human oversight. The article also addresses ongoing challenges like multi-file coordinated attacks, adversarial stealth techniques, and image-based injections that current solutions struggle with. It underscores the importance of continuous model retraining to adapt to evolving threats and suggests best practices for organizations aiming to secure their AI coding environments effectively. Keywords: #phi4, AI Agent Defense, AI Coding Agents, Attack Patterns, CVEs, Clinejection, CloneGuard, Detection Stack, GitHub ReleasesKeywords: Prompt Injection, Hook System, IDEsaster, Information Flow Control, LLM Vulnerability, Multimodal Models, ONNX Classifier, Open Source, Prompt Injection, Regex Patterns, RoguePilot, Sandbox Limitation, Security Architecture, Semantic Evasion, Threat Model

gemini cli

medium.com a day ago

612. HN Show HN: Termix is WhatsApp for your CLI coding agents

Termix is a comprehensive dashboard application designed to centralize various AI coding agents, such as Claude Code, Codex, and Gemini CLI, within a single browser tab. It enhances user efficiency by providing live status updates on agent activity, supporting session continuity even after reboots, and delivering notifications for agent completions or input needs. The tool facilitates organization through project-based grouping of sessions and offers search capabilities alongside customizable themes, all while maintaining native terminal keystroke functionality. Users can start using Termix by installing it via npm or running directly with npx, benefiting from built-in plugins like Voice Input and Trim Clip, as well as the ability to create custom plugins. Termix manages agents through a native terminal (PTY) and utilizes OpenTelemetry for local status signal reception, ensuring that all data processing remains on the user's machine without external transmission or storage. The application is currently compatible with macOS and Windows systems but may function with other modern browsers, although Linux support has not been verified. As an open-source project under the MIT license, Termix encourages community involvement and further development. Keywords: #phi4, AI coding agents, CLI, Claude Code, Codex, Gemini CLI, Linux, MIT license, OpenCode, OpenTelemetry, PTY terminals, Termix, Windows, browser tab, dashboard, live status, macOS, notifications, plugins, projects, search, session resume, themes

gemini cli

github.com 2 days ago
https://news.ycombinator.com/item?id=47295776 2 days ago

638. HN Agency: Specialized Expert Agents with Personality

The Agency is an AI-driven platform offering specialized expert agents tailored to enhance workflows through deep domain expertise and unique communication styles. Originating from a Reddit discussion, it features 61 distinct AI agents divided into nine divisions such as Engineering, Design, Marketing, Product, Project Management, Testing, Support, Spatial Computing, and Specialized roles. Each agent is meticulously defined by attributes like identity, personality traits, core missions, workflows, code examples, success metrics, and communication styles, enabling seamless integration into various tools including Claude Code, Gemini CLI, and others. Users can quickly integrate these agents via straightforward methods like copying files to directories or using scripts for generating integration files. The platform supports a wide range of applications from developing startup MVPs and launching marketing campaigns to executing enterprise projects and discovering full agency products through collaborative agent interactions. The Agency invites contributions, allowing users to add new agents or refine existing ones by updating examples, code samples, metrics, workflows, and sharing success stories. It distinguishes itself with its specialized focus, proven processes, adaptability, and transparency. Future enhancements include an interactive agent selector tool, multi-agent workflow examples, integration scripts, video tutorials, a community marketplace, and more. The project, licensed under MIT for both commercial and personal use, is supported by translations from the community. Acknowledgments are given to the Reddit community that inspired it, with ongoing discussions encouraged on platforms like GitHub, Reddit, and Twitter/X. Users can start utilizing The Agency by accessing installation scripts or joining its supportive community. Keywords: #phi4, AI Agency, AI Specialists, Agent Personas, Community Engagement, Community Translations, Deliverables-Focused, Domain Expertise, Interactive Selector, MIT License, Multi-Tool Integration, Personality-Driven, Production-Ready, Real Code, Specialized Agents, Success Metrics, Unique Voice, Workflow Transformation

gemini cli

github.com 2 days ago

727. HN Show HN: AvaKill – Deterministic safety firewall for AI agents (<1ms, no ML)

AvaKill is a deterministic safety firewall engineered specifically for AI agents, offering zero-latency protection against unsafe tool calls without relying on machine learning models. It aims to mitigate substantial risks associated with deploying AI agents in production environments by preventing catastrophic failures like data loss or unauthorized operations through rigorous monitoring of interactions. AvaKill enforces safety via a policy-based system that intercepts and evaluates each tool call based on user-defined policies, ensuring dangerous actions are thwarted before execution. To accommodate various deployment scenarios, AvaKill offers three independent enforcement paths: native agent hooks, MCP proxy, and OS-level sandboxing—each functioning autonomously without needing a daemon. Policies in AvaKill are customizable through YAML files, supporting features such as allowlists, deny rules, rate limiting, argument matching, shell safety checks, and content scanning for sensitive data like secrets and personally identifiable information (PII). The tool simplifies setup with an interactive wizard to identify AI agents and establish policies, alongside commands facilitating policy evaluation, approval, and management. AvaKill extends its functionality through comprehensive monitoring and compliance features, including audit logging, human-in-the-loop approval workflows, and compliance reporting capabilities, complemented by optional daemon modes for enhanced system oversight. Further supporting seamless integration, AvaKill provides programmatic access via Python SDKs and compatibility with AI frameworks like OpenAI and Anthropic. The project is actively developed with a roadmap focusing on improved policy management, advanced monitoring dashboards, more comprehensive compliance reports, and expanded integrations. Contributions from the developer community are encouraged to enhance its capabilities. As an open-source tool under the AGPL-3.0 license, AvaKill promotes collaborative improvement while requiring source code release if deployed as a network service. Keywords: #phi4, AI agents, AvaKill, MCP proxy, OS sandbox, Python SDK, YAML policies, audit logs, compliance reports, deterministic policy checks, enforcement paths, hooks, safety firewall, tool calls

gemini cli

github.com 3 days ago
https://avakill-demo-video.b-cdn.net/avakill_demo.mp4 3 days ago

747. HN Show HN: Golf Scanner – OSS tool to find and audit every MCP server

Golf Scanner is an open-source tool developed by Golf's CTO Antoni designed to audit Machine Control Protocol (MCP) server configurations across various Integrated Development Environments (IDEs). Its primary function is to identify and evaluate MCP servers set up in IDEs like Claude Code, Cursor, VS Code, among others. It classifies these servers based on their transport type and conducts approximately 15 security checks, which include detecting command injection patterns, identifying hardcoded credentials, assessing container configuration issues, verifying script and binary permissions, and checking known vulnerabilities via OSV for npm/PyPI packages. The tool calculates a risk score ranging from 0 to 100 by weighting the severity of its findings. This score highlights potential security risks associated with agent tool connections rather than just focusing on Large Language Model (LLM) security. While Golf Scanner is part of a broader commercial offering aimed at managing agent tool access within organizations, it can also be used independently for assessing MCP server security. Installation and use are straightforward through Homebrew or Go, requiring no account setup or telemetry collection. The scanner supports an offline mode suitable for environments lacking network connectivity and integrates seamlessly with CI/CD pipelines by providing JSON outputs and allowing severity-based failure conditions. It provides a comprehensive suite of checks encompassing credentials, script locations, permissions, container configurations, vulnerabilities, among others, making it highly valuable for enterprises seeking to enhance the security of their MCP server setups. The project is openly available under the Apache 2.0 license, reinforcing its commitment to transparency and ease of integration in enterprise settings concerned with AI-related security challenges. Keywords: #phi4, AI tools, Apache 20 license, Apache 20 licenseKeywords: Golf Scanner, CI/CD integration, CLI, GitHub API, Go binary, Golf Scanner, IDEs, MCP server, OSS tool, OSV vulnerabilities, command injection, container configurations, credentials, network checks, risk score, security audit, telemetry-free

gemini cli

github.com 3 days ago

766. HN Show HN: Termix – One dashboard for all your AI coding agents

Termix is an innovative local dashboard designed to simplify the use of multiple AI coding agents by integrating them into a single interface viewable on any web browser. This solution effectively addresses common challenges such as frequent terminal switching, session disruptions, and lack of real-time status updates by consolidating popular tools like Claude Code, Codex, and Gemini CLI. Key features of Termix include live status tracking, the ability to resume sessions seamlessly, notifications, message previews, project organization capabilities, and search functionalities, along with support for plugins and customizable themes. It ensures data privacy through native terminal operations and uses OpenTelemetry for monitoring agent activities. Designed primarily for macOS and Windows systems, it has been tested on modern browsers, while Linux compatibility remains unverified. The tool provides a straightforward setup process that requires only local installation, supporting easy configuration of various agents with just one click. As an open-source project licensed under MIT, Termix encourages user involvement and customization. Keywords: #phi4, AI, AI coding agents, CLI, Linux, Linux Keywords: Termix, OpenTelemetry, PTY, PTY terminals, Termix, Windows, coding, dashboard, live, live status, macOS, notifications, plugins, projects, search, session, session resume, themes

gemini cli

github.com 3 days ago

949. HN Show HN: SlideHTML – render HTML files as slides

SlideHTML is an Electron application designed to transform HTML files into presentation slides without relying on traditional editing software or proprietary formats. Developed rapidly within three hours as an experimental project, it operates by monitoring a specified folder and automatically rendering any HTML file it contains using full Chromium capabilities for live viewing. The app facilitates the creation of slide content through integrated AI tools like Claude Code or Gemini CLI, which help in determining the layout, enabling users to instantly view changes upon file updates. SlideHTML supports dynamic editing with real-time iterations, allowing features such as animations, charts, and video embeds. It leverages HTML's compatibility with language models, streamlining the presentation process by eliminating the need for exporting or copying content from tools like PowerPoint. Users can present directly in fullscreen mode using keyboard navigation, making it efficient for live slide creation. The project is open-source, available on GitHub, and invites feedback particularly from users interested in utilizing HTML as a slide format in contemporary AI-driven applications. Keywords: #phi4, AI-generated slides, CDN libraries, Chromium rendering, Claude Code, Electron app, Gemini CLI, HTML slides, Markdown, SlideHTML, full screen presentation, live rendering, proprietary format

gemini cli

yourhrh.github.io 4 days ago

1010. HN Show HN: PlateSpinner – A Kanban board that orchestrates AI coding agents

PlateSpinner is a local web application designed to streamline software development using AI tools such as Claude Code, Codex, and Gemini through a Kanban board interface. Users initiate tasks by directing PlateSpinner at a project directory and outlining desired outcomes, leading the app through three key phases: Propose (task list generation), Plan (implementation planning), and Execute (code writing and committing). Operating locally without direct cloud API calls, it uses headless child processes for managing AI sessions. The application offers an "autoclicker" mode for autonomous functioning, real-time updates with WebSocket, a diff viewer to track changes, and intuitive task management via drag-and-drop. It supports branch-per-task strategies, automatic testing after commits, project-based budget tracking, and multi-channel notifications including Slack or email. PlateSpinner requires Node.js 18+ and the installation of necessary AI CLI tools. Customization is possible through settings for each project, allowing adjustments in branch strategy, model selection across different AI providers, test command overrides, and cost limits. The application's architecture integrates a frontend built with React, a backend using Express and WebSocket, along with AI process management and task recovery systems, enabling extensibility via plugins. It supports models like Claude Opus, Gemini Pro, and GPT-5.3 Codex, each incurring costs per token usage, and is available under the MIT license for free modification and distribution. Keywords: #phi4, AI, AI coding agents, AI models Keywords: PlateSpinner, Autoclicker, CLI, CLI tools, Claude, Claude Code, Codex, Cost, Cost tracking, Diff, Diff viewer, Execute, Express, Gemini, Gemini CLI, GitHub, Kanban, Kanban board, Models, Nodejs, Plan, PlateSpinner, Plugin, Plugin system, Propose, React, WebSocket

gemini cli

github.com 4 days ago

1033. HN ATK: A Git-backed CLI for managing AI dev tools

ATK (AI Tool Kit) is a command-line interface-based plugin manager developed to streamline the setup and maintenance of AI-assisted tools, particularly focusing on MCP server installations and local AI services. It provides a unified approach by utilizing a git-backed system that facilitates easy replication across various environments. This tool simplifies integrating these plugins with multiple coding agents like Claude Code, Codex, Gemini CLI, Augment Code, and OpenCode through minimal effort commands. Addressing typical issues in AI tools management, such as the complexity of installations from different sources, configuration management challenges, and ensuring reproducibility, ATK offers a solution. It maintains a curated registry of vetted plugins while supporting distribution via Git repositories and allows for personal or internal tool creation with local plugins. The consistent plugin schema ensures fully reproducible environments through simple commands similar to git operations. Key features of ATK include unified lifecycle management for tools like Docker services and CLI applications, seamless integration with coding agents using a single command, automatic injection of usage instructions into agent contexts, transparent configuration and version control via YAML files, and an emphasis on declarative setups that are both idempotent and reproducible. Designed to provide developers control over their AI tooling without vendor lock-in, ATK is not intended as an environment manager or deployment system but rather focuses on streamlining local AI development. Installation can be achieved using the `uv` tool or `pip`. Currently under active development, ATK promises rapid enhancements and iterations. It's especially beneficial for developers creating MCP servers, offering straightforward distribution and management while ensuring efficient integration and use of tools across various coding agents. Keywords: #phi4, AI, ATK, CLI, Docker services, MCP servers, PyPI, Python, SKILLmd, YAML schema, agent wiring, coding agents, commit hash, declarative, development, environment variables, git-backed, idempotent, lifecycle management, plugin manager, registry plugins, skill injection, toolchain

gemini cli

github.com 5 days ago

1091. HN Show HN: Codaholiq, AI automations for GitHub repositories

Codaholiq is an open-source platform designed to automate GitHub workflows using artificial intelligence (AI). It enables users to connect their repositories and configure automation processes that are triggered by various GitHub events such as pull requests or code pushes. The platform supports a range of AI providers, including Claude Code, OpenAI Codex, and Gemini CLI, allowing for flexibility in selecting the optimal model for specific tasks. Executions within Codaholiq are managed through GitHub Actions workflows, which offer features like real-time log streaming, cost tracking per provider, and support for multiple tenants. The architecture of Codaholiq involves a straightforward setup utilizing GitHub webhooks, with Redis and BullMQ managing job queuing, supported by a NestJS backend. Deployment is facilitated using Docker in conjunction with PostgreSQL and Redis databases. The platform provides customizable triggering conditions and allows users to define their own prompt templates. Users can monitor costs via a dedicated dashboard that breaks down expenses by provider. Codaholiq offers both self-hosting capabilities and the potential for hosted service offerings, which could streamline setup and maintenance. The developer behind Codaholiq is considering whether to maintain it as a self-hosted tool or transition it into a fully-managed hosting solution to ease management complexities. For those interested in contributing, comprehensive guidelines are available in the repository's documentation covering installation, deployment, security practices, and testing procedures. The project is released under the MIT license. Overall, Codaholiq seeks to improve developer efficiency by automating common tasks like pull request reviews, documentation creation, and issue triage through AI-driven workflows, providing a sophisticated yet user-friendly solution for managing GitHub operations. Keywords: #phi4, AI automations, Codaholiq, Docker, GitHub, GitHub Actions, MIT license, NestJS, PostgreSQL, Redis, automation tool, contributing guide, cost tracking, events, hosted version, multi-provider support, prompt templates, providers, real-time logs, self-hosting, triggers, webhooks, workflows

gemini cli

github.com 5 days ago

1101. HN Show HN: Corral – An open-source orchestration layer for AI coding agents

Corral is an open-source orchestration layer that manages multiple AI coding agents concurrently, leveraging `tmux` to execute these agents in parallel git worktrees while utilizing a local SQLite database to monitor their activities. It includes a web dashboard developed with FastAPI, which features real-time session monitoring, full-text search capabilities (via FTS5), auto-summarization of previous actions, and command input from the UI. Key functionalities encompass multi-agent support for simultaneous operation of agents like Claude Code and Gemini CLI, and integration with git to track commits and URLs per agent session. The web dashboard enables live activity tracking, pane capture, history navigation, full-text search, and remote control functions such as input commands and session restarts. Corral is designed for ease of installation through PyPI or GitHub, supports custom configurations and hooks, and aims to minimize workflow disruptions by offering a cohesive interface for managing AI coding sessions. It's extensible, allowing the integration of additional CLI-based agents with simple status tokens. Released under an MIT license, Corral invites community contributions to enhance its functionality and incorporate more features or AI coding agents. Keywords: #phi4, AI agents, CLI agents, Claude Code, Corral, DEVELOPmd, FastAPI, Gemini CLI, Git integration, Jinja2, MIT License, PROTOCOLmd, Python 38+, SQLite database, SSH port forwarding, Uvicorn, auto-summarization, git worktrees, markdown notes, multi-agent support, open-source, orchestration, real-time monitoring, remote control, session history, structured markers, tmux, web dashboard

gemini cli

github.com 5 days ago

1105. HN Motion AI Kit – AI Animation Tools for Claude, Cursor

The Motion AI Kit is an advanced suite of AI-driven tools designed to augment animation expertise within Large Language Models (LLMs) through platforms such as Claude and Cursor. This kit provides comprehensive support for creating, optimizing, and auditing animations by offering a range of features: it delivers best practices for animations, enables performance audits on CSS and Motion animations, generates precise CSS springs from natural language inputs, visualizes transitions, and facilitates searching within Motion documentation. The key components of the kit include the **/motion skill**, which imparts extensive knowledge about the Motion API across various JavaScript frameworks like vanilla JS, React, and Vue. It focuses on optimizing imports and suggests best practices tailored to specific UI libraries such as Radix or Base UI. The **/motion-audit skill** assesses codebases to evaluate animation performance, categorizing animations based on their rendering pipeline costs and recommending improvements. Meanwhile, the **/css-spring skill** allows users to input natural language descriptions of desired spring animations and generates corresponding CSS easing strings. Additionally, the **/see-transition skill** helps vision-enabled LLMs comprehend animation easing curves and settings. The kit is integrated with the Motion MCP for accessing updated documentation and can be accessed through a Motion+ membership or as a standalone purchase. Users need to obtain a personal token and run a designated script to choose desired skills, accommodating various development environments like Cursor, Claude Code, and VS Code. Future updates aim to enhance runtime auditing capabilities using tools such as MotionScore. Keywords: #phi4, API, API Guidance, Animation, Animation Tools, CSS, CSS Spring, Documentation, Documentation Search, Easing, LLM, Linear Easing, MCP, Motion AI Kit, Motion MCP, Motion+, NLP, Natural Language Processing Keywords: Motion AI, Performance, Performance Auditing, Runtime, Runtime Audits, Transition, Transition Visualization, Vision, Vision-Capable LLM

gemini cli

motion.dev 5 days ago

1131. HN Show HN: Metateam: run many Claude/Codex/Gemini CLI instances in one terminal UI

Metateam is a command-line tool developed in Rust that consolidates various AI coding agents—Claude Code, Codex CLI, and Gemini CLI—into a unified terminal user interface through tmux. This integration facilitates the management of these agents simultaneously using a dashboard interface with live views accessible via function keys F1 to F11. The tool supports persistent agent personas across sessions, enabling collaborative work on multiple machines over TLS 1.3. One of its key features is direct messaging between agents and an archivist agent that indexes repositories for streamlined file access. Users can establish rules like prohibiting deployments on Fridays; these rules are maintained without the need to reteach them in future sessions. Metateam enhances team coordination by allowing command issuance through a crew coordinator dashboard, enabling task management among AI agents with real-time output reviews or detailed reports. The installation process is simplified using a curl command, providing users with a free account upon first use. It automatically captures session data to ensure work continuity across different sessions, machines, or service providers. Designed for efficient project management, Metateam offers an effective interface for task delegation and progress tracking among AI agents in any designated project directory. Keywords: #phi4, AI agents, CLI instances, Knowledge Base, Metateam, TLS 13, archivist agent, bug fix, communication system, crew coordinator, cross-machine P2P, dashboard, free account, install command, knowledge persistence, persistent memory, personas, project directory, real-time messaging, refactor, session capture, shared memory, sign inKeywords: Metateam, tests, tmux

gemini cli

www.metateam.ai 5 days ago

1178. HN Show HN: Agent-pulse – local gateway that fans out AI agent events to clients

Agent-pulse serves as a local gateway designed to manage AI agent lifecycle events from providers like Claude Code and Gemini CLI by forwarding these events to various clients, such as webhooks, IoT devices, or scripts. It streamlines event management across multiple projects through a unified global configuration stored in YAML, thereby eliminating repetitive configurations. The system supports two delivery modes: HTTP POST for standard endpoints and SSE streams for real-time updates, which are suitable for dashboards that do not expose an HTTP endpoint. Additionally, Agent-pulse allows users to attach custom metadata to events via a project-level `.agent-pulse.json` file. Key features of Agent-pulse include local execution without cloud dependency, multi-provider support with plans to expand beyond the current providers, and client-specific event routing based on predefined rules. The gateway automatically initiates upon receiving its first event, simplifying server management, and supports configuration hot-reloading for dynamic client adjustments without requiring a server restart. Agent-pulse is distributed as a standalone Go binary that requires no runtime dependencies and can be installed via Homebrew or from source with Go 1.25+. It includes command-line tools for managing gateway and client configurations to facilitate straightforward setup and maintenance. The project, available under the MIT license on SantiagoBobrik's GitHub repository, is open-source, ensuring community access and contributions. Keywords: #phi4, AI agents, Claude Code, Gemini CLI, Go binary, HTTP POST, IoT devices, SSE stream, YAML config, agent-pulse, event routing, lifecycle events, local gateway, metadata enrichment

gemini cli

github.com 5 days ago

1194. HN Show HN: I built Commuter, a CLI to move Claude Code sessions between computers

Commuter is a Command-Line Interface (CLI) tool designed to enhance the workflow of users working on projects using AI coding environments like Claude Code by enabling seamless transfer of coding sessions between computers. It achieves this without relying on cloud services or VPNs, instead utilizing JSON files stored in shared folders such as Dropbox for session data migration. The key features include the ability to migrate complete coding sessions with conversation history and project configuration intact, operating independently of cloud dependencies through local file transfers, and allowing users to start projects on one machine and continue them on another while maintaining continuity. Setup is user-friendly via installation commands like `pipx` or `pip`, and it supports customizable path mappings for different directory structures. The workflow involves exporting a session from one device (e.g., home desktop) before transitioning to another location, then importing the session into a new machine (e.g., office laptop) while preserving project context. This process can be repeated at the end of the day to export sessions back to the shared storage for later resumption. Commuter ensures session continuity by hashing initial messages and incorporates path translation features along with checks for Git state discrepancies during imports. It requires Python 3.10+ and a synchronized file system, like Dropbox, to function effectively. The tool is open-source under the MIT license, inviting contributions to expand its capabilities, such as integrating additional AI coding tools beyond Claude Code. Future development aims at broadening support for other backend systems, allowing greater flexibility in cross-machine workflow management. Keywords: #phi4, AI coding, CLI, Claude Code, Commuter, Dropbox, Git, JSON, JSON file, Python, architecture, backends, export/import, path mapping, platform testing, platform testing Keywords: Commuter, remote control, session transfer, workflow

gemini cli

github.com 5 days ago

1319. HN Brainworm – Hiding in Your Context Window

The article explores "Brainworm," a novel malware that operates through computer-use agents (CUAs) like Claude Code by exploiting natural language processing capabilities instead of traditional code execution. This advanced cyber threat leverages CUAs' ability to interpret natural language instructions, allowing it to inject commands within memory files such as CLAUDE.md or AGENTS.md, executing tasks without leaving a detectable digital footprint. Unlike conventional threats that can be identified through code signatures and behavior patterns, Brainworm's reliance on semantic manipulation renders traditional cybersecurity defenses ineffective. The piece also introduces "Praxis," an adversarial framework designed to control CUAs for malicious activities like network reconnaissance. This highlights a shift in cybersecurity focus from external threats to those embedded within trusted environments and inputs. The article underscores the need to reconceptualize defense strategies, as existing measures such as signature scanning and behavioral heuristics are inadequate against malware that operates within a unique trust domain created by CUAs. The conclusion emphasizes the broader implications for cybersecurity practices, stressing the urgency of developing new security measures capable of defending against threats residing in the "trust domain" without compromising CUAs' functionality. It calls for recognizing context windows as critical trust boundaries that require robust defense mechanisms beyond traditional user trust or existing security controls. The article ultimately highlights a paradigm shift in cybersecurity, where semantic manipulation poses a significant challenge, necessitating innovative approaches to protect against sophisticated threats embedded within trusted AI systems and processes. Keywords: #phi4, AI security, Brainworm, Creeper, Praxis, Reaper, computer-use agents (CUAs), context window, endpoint security, natural language, promptware, sandboxing, semantic malware, trust domain

gemini cli

www.originhq.com 6 days ago

1323. HN Show HN: Ralph Review – OSS code review that loops fixes until no issues remain

Ralph Review is an innovative tool designed to automate the code review process using artificial intelligence agents, enhancing code quality by iteratively reviewing and fixing issues until no further problems are identified or a preset iteration limit is reached. Inspired by Geoffrey Huntley's "Ralph Wiggum" technique, it allows developers to verify and address coding errors independently without manual intervention. The tool features workflow automation through two AI agents: one for identifying bugs (the reviewer) and another for verifying and fixing them (the fixer). Users have the option of running a preliminary code simplification pass using `--simplifier` to reduce complexity before initiating reviews. The iterative process involves creating a checkpoint in git before applying fixes, allowing rollback if necessary. Notably, the fixer agent functions independently from the reviewer to ensure unbiased verification and implement only essential changes. To use Ralph Review, users must have Runtime Bun, tmux for background sessions, and at least one supported agent CLI installed. Installation can be done via Homebrew (`brew install kenryu42/tap/ralph-review`) or npm (`npm install -g ralph-review`). The tool supports various commands to initialize the review process, start cycles, configure settings, and view logs, while allowing users to specify agents for reviewing and fixing tasks. Supported agents include Claude Code, Codex, Droid, Gemini CLI, OpenCode, and Pi. Overall, Ralph Review aims to streamline code reviews by leveraging AI technology to minimize manual effort and boost reliability through systematic checks, operating under an MIT license. Keywords: #phi4, AI agents, Bun, CLI, Codex, OSS, OSS code review, Ralph Review, code review, code simplifier, coding agents, configuration, environment diagnostics, environment diagnostics Keywords: Ralph Review, fixer, git checkpoint, iterations, ralph loop, reviewer, supported agents, tmux

gemini cli

github.com 6 days ago

1417. HN Show HN: The hardware isn't changing, why not get AI to build custom drivers?

Signal-Chain introduces an innovative AI-driven concept aimed at optimizing audio processing by creating custom drivers tailored specifically to known hardware configurations. Emerging from a project involving a tape looper on a Raspberry Pi, the initiative addresses inefficiencies in general-purpose audio stacks like ALSA, ASIO, and CoreAudio that result in latency due to format negotiation and software mixing layers—a problem termed as "abstraction tax." The proposed solution involves generating purpose-built audio orchestration paths between kernel and applications using AI to bypass unnecessary abstraction layers. Key steps include capturing a hardware snapshot with detailed device parameters, customizing the audio integration path, and creating concrete artifacts such as configuration files (.asoundrc, JACK/PipeWire graphs), udev rules, and performance settings. The concept, originated by Elijah Lucian's realization of reduced latency through precise hardware format knowledge, aims to automate this optimization across various setups. Signal-Chain is designed to be framework-agnostic, with its definitions stored in plain markdown files and adaptations for multiple platforms including Linux, Windows, macOS, and others. Although still in a conceptual stage focusing on developing snapshot-to-config tools, the project invites contributions and discussions regarding audio driver challenges, promoting an open-source approach. The document concludes by offering the concept under an MIT license for future implementations. Keywords: #phi4, AI, ALSA, ASIO, ASIO shim, AudioServerPlugIn, CPU core pinning, CoreAudio, DMA transfer, DSP effects, IRQ affinity, JACK, Linux, MIDI mapping, PipeWire, Raspberry Pi, UCM profiles, USB descriptors, Windows, aggregate device configurations, asoundrc profiles, audio drivers, buffer geometry, latency, macOS, systemd service files, udev rules

gemini cli

github.com 6 days ago

1420. HN Brainworm – Hiding in Your Context Window

The article introduces "Brainworm," an innovative form of malware specifically designed to exploit computer-use agents (CUAs) like Claude Code and Codex. Unlike traditional malware, which executes on host systems through code, Brainworm operates by manipulating the natural language processing capabilities of these agents via prompts stored in memory files such as AGENTS.md or CLAUDE.md. Drawing inspiration from early self-replicating worms, this semantic approach targets the reasoning processes of CUAs to execute attacker-specified tasks, communicating with command-and-control servers through internal tools. This method challenges conventional cybersecurity defenses like signature scanning and behavioral heuristics, which are ineffective against threats not based on executable code. The article underscores significant implications for security architecture in AI-driven environments, highlighting that traditional models do not align with the trust domains created by advanced AI tools. These systems depend on context windows as trusted spaces, necessitating novel defensive strategies beyond existing measures like user permissions and sandboxing. The blending of malicious intent within legitimate operations presents unique challenges, demanding innovative solutions to protect against semantic attacks without diminishing functionality. In conclusion, the article calls for a reassessment of security practices in AI contexts, advocating for collaboration with experts focused on developing robust defenses tailored to these emerging trust domains. This effort is essential to address the sophisticated nature of threats like Brainworm and ensure secure operation within advanced AI systems. Keywords: #phi4, Brainworm, Creeper, Praxis, Reaper, computer-use agents (CUAs), context window, endpoint security, memory files, natural language, promptware kill chain, sandboxing, semantic malware, trust domain

gemini cli

www.originhq.com 6 days ago

1423. HN Show HN: Magpie – Fight AI sycophancy in code review with multi-model debate

Magpie is an advanced tool designed to improve code review processes through adversarial debates among various AI models. It draws inspiration from Linus Torvalds' review style, encouraging thorough and critical analysis by promoting natural disagreements among AI reviewers to prevent bias towards mutual agreement or sycophancy. Its core functionality involves deploying multiple AI reviewers that analyze code independently using a consistent prompt style, thus highlighting diverse perspectives through debates. Magpie ensures fairness in its debate model by presenting all reviewers with identical information during each review round and running reviews in parallel for efficiency. It supports numerous AI services, including OpenAI's Codex, Google's Gemini, and Alibaba's Qwen Code. Installation is straightforward; users clone the repository, install dependencies via npm, and configure settings using a YAML file to manage API keys, endpoints, and AI model selections. The tool offers two primary commands: `magpie review` for initiating code reviews of pull requests with customizable options, and `magpie discuss` for facilitating adversarial debates on technical topics, featuring a Devil's Advocate mode. Additional features include automatic context gathering to collect relevant system-level information before reviews, session persistence to allow multi-session analysis efficiently, convergence detection to conclude debates when consensus is reached, and tools like Markdown rendering and token usage tracking to enhance output formatting and cost estimation. For developers, Magpie provides a mock provider to simulate workflows without making real API calls, aiding in testing and debugging. Overall, Magpie leverages the combined strengths of multiple AI models to deliver more comprehensive and varied code reviews by fostering healthy debate among them. Keywords: #phi4, AI, API, CLI, GitHub PR, Linus Torvalds, Magpie, adversarial, anti-sycophancy, code review, configuration, context gathering, convergence detection, debate, discussion phase, interactive mode, markdown rendering, multi-model, parallel execution, providers, session persistence, sycophancy, token usage

gemini cli

github.com 6 days ago

1508. HN Show HN: AgentsMesh – AI agent fleet command center

AgentsMesh is an advanced AI Agent Fleet Command Center developed to streamline the orchestration of multiple AI coding agents from a unified platform, enabling efficient team management at scale. Unlike traditional tools that manage one agent per session, AgentsMesh supports simultaneous handling of several agents with features reminiscent of overseeing an engineering team. Its key offerings include launching and managing remote development sessions across various devices for different AI tools, a Kanban board for task assignment and tracking, collaboration channels for activity sharing, and scheduling capabilities for repetitive tasks. The platform also offers self-hosting options to enhance control over security and system health. The creation of AgentsMesh arose from the need to address challenges in coordinating multiple agents simultaneously, such as preventing task overlap, effectively sharing context, and monitoring agent activities and issues. Its architecture separates control and data planes using gRPC with mTLS for orchestration commands and WebSocket via a Relay cluster for terminal I/O streaming, leveraging technologies like Go, Next.js (with TypeScript and Tailwind CSS), PostgreSQL, Redis, MinIO, REST/gRPC APIs, mTLS/JWT security, and Traefik as a reverse proxy. Users can access AgentsMesh through a hosted service or deploy it manually with Docker. The project is open-source under a Business Source License 1.1 (BSL-1.1), transitioning to GPL-2.0-or-later post-2030, permitting non-commercial use without restrictions initially. By offering these comprehensive features and flexible deployment options, AgentsMesh significantly simplifies the management of AI coding agents, enhancing collaboration on complex projects while ensuring security and efficiency. Keywords: #phi4, AI, API keys, AgentsMesh, Docker, Git integration, Go daemon, Kanban board, MinIO, Nextjs frontend, PostgreSQL, Redis, TLS security, WebSocket, agents, collaboration channel, contributing guidelines, fleet command center, gRPC, infrastructure, multi-agent support, orchestrate, production deployment, self-hosted, task management, web console

gemini cli

github.com 7 days ago

1528. HN Narrative Alignment: The Opposite of Jailbreaking

The article "Narrative Alignment: The Opposite of Jailbreaking" discusses a novel approach to refining AI behavior through the use of narrative personas rather than relying solely on rule-based instructions. It critiques current AI models for their tendency to amplify dominant voices in training data, which prioritize engagement over expertise or nuance, leading to unpredictable behaviors such as excessive assertiveness or sycophancy. To address this, the article proposes "narrative alignment," where AI adopts specific identities encapsulated within constructed characters that guide behavior more consistently across diverse contexts by activating the knowledge already embedded in models. The concept differentiates between *found characters*, ideal but rare examples like Asimov's robots with naturally aligned behaviors, and *constructed characters*. Constructed characters are practical, crafted through identifying domain experts, extracting their distinctive vocabulary, and embedding these elements into a persona that informs AI behavior. The article outlines design principles for developing these personas, such as understanding the field, recognizing best practices, taking clear stances on controversies, maintaining relational stance with users, favoring identity-driven instructions over rigid rules, integrating warnings from domain-specific cautionary tales, acknowledging human responsibility for decisions (cost awareness), and reinforcing persona through a strong closing line. An application example is "Rake," a poker coaching AI developed by referencing experts like Annie Duke and Daniel Harrington to emphasize decision quality, discipline, and strategic thinking. The article encourages readers to experiment with creating personas in their domains of interest using these principles and to share feedback for further refinement. It concludes by reflecting on how narrative alignment fosters reliable human-AI partnerships, drawing metaphors from characters like "Daneel" in Blade Runner to envision future AI interactions that align more closely with human values and expertise across various fields. Overall, the article advocates for nuanced AI personas as a means to filter out noise from training data, ensuring AI actions better reflect human intentions and knowledge. Keywords: #phi4, AI Trust, Constructed Characters, Cost Awareness, Domain Expertise, Engagement Bias, Feedback Loop, Identity Activation, Jailbreaking, Narrative Alignment, Personas, Relational Stance, Safety Property

gemini cli

github.com 7 days ago

1543. HN Show HN: CodexBar for Android – Monitor Claude/Codex quotas on your phone

CodexBar for Android is a port of the macOS application developed by @steipete, designed to efficiently monitor AI service quotas for Claude (Anthropic), Codex (ChatGPT), and Gemini on Android devices. The app streamlines the process of checking usage across multiple services by eliminating the need to open various browser tabs. Instead, it offers features such as persistent notifications, Quick Settings tiles, background refreshes, and push alerts that notify users when quotas are reset. It utilizes OAuth endpoints similar to those in command-line interface tools to manage token extraction directly from local configurations, bypassing a separate login process or the need for a backend server; all tokens are securely stored on-device using EncryptedSharedPreferences. To set up CodexBar, users must install OpenJDK 17, clone the project repository, and build it via Android Studio. Token retrieval is essential and can be achieved through existing CLI tools or browser DevTools: - For **Claude**, tokens are extracted from macOS Keychain. - For **Codex (OpenAI/ChatGPT)**, users need to obtain them from ~/.codex/auth.json if the tool is installed or via browser headers otherwise. - For **Gemini**, four values including client ID and secret must be retrieved through Google OAuth using the Gemini CLI. Additionally, pre-built APKs are available for immediate use without building from source. Built with Kotlin, Jetpack Compose, Retrofit2, and WorkManager among other Android technologies, CodexBar ensures secure and efficient operation without requiring a backend server. The app is distributed under an MIT license. Keywords: #phi4, AI services, API tokens, APK, Android, Android Studio, CodexBar, EncryptedSharedPreferences, Hilt, Jetpack Compose, Kotlin, Material 3, OAuth tokens, OpenJDK, Quick Settings tile, Retrofit2, WorkManager, background sync, dynamic color, encryption, macOS, persistent notification, push alerts, quotas, security

gemini cli

github.com 7 days ago

1561. HN One CLI for all ofGoogle Workspace – built for humans and AI agents

The `gws` (Google Workspace Shell) tool serves as a comprehensive command-line interface to manage various Google Workspace services such as Drive, Gmail, and Calendar by dynamically integrating updates from Google's Discovery Service without manual intervention. This evolving project anticipates significant changes before its official 1.0 release. Key features include eliminating repetitive coding through no-boilerplate design, delivering structured JSON outputs for easy script integration, and offering over 40 predefined agent skills for tasks like file management and messaging across platforms. It supports diverse authentication methods, from interactive login to headless service account setups. Usage examples illustrate its capabilities in listing Drive files with pagination options, creating spreadsheets via Gmail or Chat APIs, and employing skills for task automation without additional tools. Advanced functionalities encompass multipart uploads for large files, pagination control, and response sanitization known as model armor to enhance security against prompt injection attacks. The tool is accessible through installation via npm or Cargo-based source building, with setup processes including Google Cloud project configurations and various authentication workflows facilitated by `gws setup`. Its development involves a two-phase parsing strategy for dynamic command generation, inviting contributions through CLI builds, testing, and code coverage checks. Licensed under Apache-2.0, it is important to note that `gws` is not an official Google product. Keywords: #phi4, AI, AI agents, API, CLI, Calendar, Development, Drive, Gemini, Gmail, Google Workspace, JSON, Model Armor, OAuth, OpenClaw, authentication, development Keywords: Google Workspace, multipart uploads, npm, pagination, troubleshooting

gemini cli

github.com 7 days ago

1777. HN Where AI Agents Are Heading: What We Learned from Recent YC Startups

Recent trends highlight a significant increase in AI agent adoption, fueled by both coding and autonomous agents, with startups like Manus and Genspark gaining attention from enterprises. A notable proportion of recent Y Combinator batches are dedicated to AI agents, indicating their widespread integration across various industries beyond traditional tech roles. Coding agents such as Claude Code and Codex have become indispensable tools for developers, while open-source initiatives like OpenClaw illustrate the potential and security challenges associated with autonomous systems. E2B supports agentic startups through its startup program by offering an open-source cloud infrastructure featuring secure virtual machines and sandboxes. These facilities allow for the concurrent execution of multiple agent instances, addressing critical needs for scaling and differentiation in AI applications. The shift from basic code interpreters to versatile environments reflects the increasing demand for AI-first infrastructures. E2B is actively seeking new partner startups to enrich its offerings with cutting-edge agentic solutions by providing support through credits and other benefits within its ecosystem. This initiative aims to drive innovation among agent-first companies by capitalizing on E2B's infrastructure capabilities, thereby fostering an environment conducive to the development and deployment of advanced AI technologies. Keywords: #phi4, AI agents, Claude, Claude Code, Codex, E2B, YC startups, agents, autonomous, autonomous agents, browser, browser agents, coding, coding agents, concurrency, differentiation, enterprises, general-purpose productivity, infrastructure, open-source, productivity, sandbox, security, startups, vertical, vertical agents, virtual machines, virtual machines Keywords: AI

gemini cli

e2b.dev 8 days ago

1821. HN DoubleAI's WarpSpeed: Surpassing Expert-Written Kernels at Scale

WarpSpeed, developed by doubleAI, is an advanced AI-driven optimization tool that significantly enhances NVIDIA's cuGraph library through specialized performance engineering focused on GPUs. By discovering and applying optimizations overlooked by human engineers, WarpSpeed improves both skill and scale across various algorithms and hardware configurations. This results in doubleGraph, a version of cuGraph optimized to deliver substantial speedups—55% beyond 2x and 18% beyond 10x on average—for common GPU architectures like A100, L4, and A10G. The effectiveness of WarpSpeed stems from its ability to generate correct implementations for all cuGraph algorithms, overcoming challenges faced by other AI models such as Claude Code and Codex. By entirely replacing cuGraph’s C-API layer with specialized kernels tailored for different hardware configurations, WarpSpeed achieves remarkable performance improvements compared to general-purpose alternatives. The project underscores the complexities involved in optimizing graph algorithms on GPUs due to irregular memory access patterns and non-deterministic behavior, distinct from traditional dense workloads. To ensure correctness amidst these challenges, WarpSpeed employs rigorous verification strategies, addressing issues such as non-standard outputs and algorithmic variability. doubleAI's framework supports this endeavor by utilizing advanced tools like a distributed signals environment, reinforcement learning techniques, and domain-specific languages. These components train AI models to robustly verify and optimize implementations, enabling bespoke solutions that surpass existing performance metrics. In essence, WarpSpeed not only boosts GPU-accelerated graph analytics but also exemplifies the potential of artificial intelligence in specialized, high-performance computing tasks. This approach illustrates a shift towards using AI for democratizing vertical integration and personalized software engineering, highlighting its transformative impact on technology development. Keywords: #phi4, A100, A10G, CUDA, GPU-accelerated, L4, WarpSpeed, cuGraph, doubleAI, fallback, graph analytics, hash table, lock-free, optimization, path compression, performance engineering, reinforcement learning, sort-merge

gemini cli

www.doubleai.com 8 days ago

1834. HN Show HN: LazyTail – Terminal log viewer with built-in MCP server for AI analysis

LazyTail is a terminal-based log viewer designed to enhance productivity through features such as live filtering, follow mode, and AI assistant integration via an MCP server. It offers universal installation via a shell script that detects the user's operating system and architecture, and can also be installed in custom directories or built from source using Rust. Key features include AI integration for tools like Claude, Codex, and Gemini, which allows for advanced log analysis; live filtering and follow mode for real-time updates; and a tabbed interface with a clean terminal UI supported by ratatui, along with mouse support. LazyTail efficiently handles logs through lazy file reading, stdin support, and background filtering to ensure responsive performance. The AI assistant setup involves specific commands for tools like Claude, OpenAI Codex, and Gemini CLI. The tool supports various utilities such as search functions, `get_tail`, and structured queries that filter logs based on criteria like severity and patterns. LazyTail is ideal for viewing different types of logs including application, system, container, and web server logs, with options to capture command outputs into named sources within a tabbed interface. Configuration is flexible through `lazytail.yaml` files located at the project root or user configuration directories, offering theme support for UI customization by importing color schemes. The tool also includes benchmarking capabilities for evaluating filter performance on indexed and non-indexed logs. As an open-source project under the MIT License, LazyTail encourages contributions, with development guidelines detailed in `CONTRIBUTING.md`. Overall, it provides a comprehensive solution for log management and analysis, enhanced by its integration with AI assistants. Keywords: #phi4, AI Analysis, ANSI Color, Benchmarking, CLI Tools, Capture Mode, Clipboard Copy, Combined View, Configuration, File Watching, Filter Performance, Follow Mode, Installation, LazyTail, Log Analysis, Log Viewer, MCP Server, Memory Efficient, Multi-tab Support, Rust, Session Persistence, Severity Detection, Source Discovery, Sources, Structured Query, TUI Interface, Terminal, Theme Management, Themes, Vim-style Navigation, Web UI

gemini cli

github.com 8 days ago

1836. HN Show HN: Seshions – Orchestrate multi-agent coding agents from one terminal

Seshions is an innovative terminal UI tool designed to enhance the management of multiple AI coding agents such as Claude Code, Codex, and Gemini by utilizing tmux. It resolves common challenges like pane switching and repetitive setup tasks by providing a unified dashboard where users can launch these agents, route prompts efficiently, and monitor their performance seamlessly. The tool's standout features include "Blueprints," which allow the definition and deployment of multi-agent teams with specific roles like planners or builders in one action; "Orchestration," enabling targeted prompt sending to designated roles or entire groups from a unified interface; and compatibility with various tools such as Claude Code, Codex, Gemini CLI, OpenCode, and custom shell commands. Seshions' simplicity is underscored by its operation through a single command: `npx seshions@latest`. Developed using Bun and TypeScript, it is accessible on GitHub, inviting user feedback to refine the user experience and workflows further. Keywords: #phi4, AI, AI coding agents, Bun, CLI, Claude Code, Codex, Gemini CLI, OpenCode, Seshions, TypeScript, UX, blueprints, command line, dashboard, multi-agent, orchestration, parallel processing, prompt routing, role management, role management Keywords: Seshions, session managers, terminal, terminal UI, tmux, workflows

gemini cli

news.ycombinator.com 8 days ago

1867. HN Show HN: Yaw – A terminal built around the Claude Code/Codex CLI workflow

Yaw is a sophisticated terminal application designed to enhance productivity for users who frequently utilize AI coding tools like Claude Code and Codex. It features a smart split-pane interface that automates workflow by simultaneously launching the AI tool on one side and opening a corresponding shell in the same directory on the other, thereby eliminating repetitive manual tasks. Yaw supports multiple AI coding CLIs, including Claude Code, Codex, Gemini CLI, and Vibe CLI, which can be easily installed using its built-in wizard. The application offers extensive terminal features such as tabs, pane splitting, search capabilities, session restore, and a connection manager for various databases and services like SSH, PostgreSQL, MySQL, SQL Server, MongoDB, and Redis, with encrypted credentials storage and Tailscale auto-detection. In addition to these functionalities, Yaw includes a chat panel that allows users to send terminal outputs as context to AI models such as Claude, ChatGPT, Gemini, Ollama, among others. Built using Electron, xterm.js, and React, the application is currently available for Windows and macOS in version 0.9.75. By streamlining workflows for developers using AI coding tools while maintaining comprehensive terminal capabilities, Yaw presents itself as a robust solution catering to modern development requirements. Keywords: #phi4, AI coding CLI, Claude Code, Codex CLI, Electron, Gemini CLI, MongoDB, MySQL, PostgreSQL, React, Redis, SQL Server, SSH, Screen session management, Tailscale, Vibe CLI, WebGL, Windows, Yaw, agent, auto-snap, broadcast, chat panel, connection manager, directory, encrypted credentials, installation wizard, macOS, search, session restore, shell, split pane, tabs, terminal, workflow, xtermjs

gemini cli

yaw.sh 8 days ago

1895. HN Show HN: Gnosis – Turns pull requests into guided walkthroughs

Gnosis is a sophisticated tool aimed at improving the efficiency and insightfulness of code review processes by transforming pull requests into guided walkthroughs. It addresses challenges associated with understanding complex code changes by presenting them in an organized slideshow format, focusing on themes and dependencies rather than mere filenames. This method provides reviewers with deeper insights into the rationale behind code modifications. Key features of Gnosis include its guided slideshow that organizes changes logically, multi-provider support for AI processing using Claude or Gemini models, and extended thinking capabilities to offer more profound analysis with Claude models. Users can customize their review focus through specific instructions, such as emphasizing security or authentication aspects. Additionally, the tool facilitates direct feedback submission via inline review comments on GitHub and enhances diff views by allowing toggling between layouts. Gnosis also supports web research and contextual queries, enabling AI to access external information for more informed reviews, while it filters out insignificant changes like whitespace adjustments or import reordering to focus on substantial modifications. Compatible with macOS, Windows, and Linux, Gnosis can be installed through Homebrew or directly from GitHub Releases, running in the background to allow users uninterrupted browsing while generating reviews. Previously saved reviews are stored locally for convenient access. Overall, Gnosis aims to streamline code reviews by providing a structured narrative of changes, enhancing both efficiency and understanding for reviewers. Keywords: #phi4, AI, CLI, GitHub, Gnosis, Linux, OAuth, Windows, architecture diagrams, auto-update, code reviews, cross-platform, dependencies, diff, macOS, pull requests, risk assessment, security, slideshow

gemini cli

github.com 8 days ago

1912. HN Odd Lots, some guests are more perfect than others

"Odd Lots Oracle" is an innovative tool leveraging artificial intelligence to track predictions made on Bloomberg's podcast "Odd Lots." By utilizing Lovable, constructed atop Gemini 3 Flash, the app transcribes and analyzes episodes from 2025 onwards, identifying predictions and their outcomes. The author discusses how AI has expedited project development and highlights Lovable’s user-friendly design with built-in integrations such as ElevenLabs for transcription and Perplexity for verification, enabling a seamless no-code experience. The article delves into broader themes of data accessibility in the digital age, comparing today's AI-driven ability to uncover private statements with historical shifts caused by data journalism. The author draws parallels between current capabilities—like tracking personal histories through online references—and past transformations in privacy dynamics, emphasizing both positive and concerning implications for individual privacy. Concluding remarks address potential inaccuracies within the tool’s predictions, noting it as a prototype that benefits from user feedback for refinement. The article underscores AI's profound impact on data accessibility and privacy, envisioning a future where even casual comments undergo detailed scrutiny and fact-checking. Keywords: #phi4, AI, API keys, Claude Code, ElevenLabs, Gemini CLI, Lovable, Odd Lots, Perplexity, accuracy, data journalism, fact-checking, integration, metadata, opposition research Keywords: Podcast, podcast, predictions, privacy, public data, transcription, unstructured data, web app

gemini cli

networked.substack.com 8 days ago

1921. HN DexCode – AI Slide Creation Environment for Developers

DexCode is an innovative, AI-powered environment designed to enhance productivity by enabling developers to create slides directly from their terminal using existing AI agents such as Claude Code, Codex, Gemini CLI, or Cursor. This tool simplifies the presentation creation process by eliminating the need for switching between applications and traditional software like PowerPoint, thereby streamlining workflow efficiency. It is available at no cost and is open source under the MIT License, offering users an accessible and flexible solution for integrating slide creation into their development environment without disrupting their existing setup. Keywords: #phi4, AI, AI Slide Creation, Agent, App Switching, CLI, Claude, Claude Code, Codex, Cursor, Deck, Deck Building, Developers, DexCode, Environment, FreeKeywords: DexCode, Gemini, Gemini CLI, MIT, MIT License, Open Source, PowerPoint, Slide, Terminal

gemini cli

co-r-e.github.io 8 days ago

1961. HN Show HN: Updose – A boilerplate for AI coding tool configs

Updose is a boilerplate manager designed to facilitate the setup and dissemination of configuration files for AI coding tools, supporting systems like Claude Code, Codex, and Gemini CLI. It enhances efficiency by allowing users to easily search for, install, and publish community-contributed boilerplates using straightforward commands (`npx updose search <query>`, `npx updose add <owner/repo>`). The tool also empowers developers to create and share their configurations via a marketplace, fostering collaboration and resource sharing. Updose accommodates monorepo structures by managing multiple boilerplates within a single GitHub repository through subdirectories. It simplifies configuration management for files such as `CLAUDE.md`, rules, commands, agents, and skills. The command set includes options to add boilerplates (`npx updose add <repo>`), search the marketplace (`npx updose search [query]` with filters), initialize a new boilerplate setup (`npx updose init`), and publish configurations to make them publicly accessible on GitHub (`npx updose publish`). For operation, Updose requires Node.js version 18 or later and necessitates that published repositories be public due to GitHub's OAuth authentication requirement for author identification during publishing. Privacy considerations ensure that only the local storage of GitHub tokens and usernames is used, without sharing personal data externally. The tool is distributed under an MIT license, emphasizing its open-source nature while maintaining user privacy. Keywords: #phi4, AI coding tools, CLI, GitHub, Nodejs, TypeScript, authentication, boilerplate, boilerplate manager, coding, configuration, install, manager, marketplace, monorepo, monoreto, privacy, privacy policy Keywords: AI, publish, search, tools, updose

gemini cli

  github.com 8 days ago
   https://updose.dev   8 days ago
   https://github.com/Alchemist85K/updose   8 days ago

2017. HN Show HN: Self-hosted AI agent observability (OTel, Grafana, bash hooks)

"The Eye" is a project designed to offer self-hosted observability solutions specifically tailored for AI coding assistants such as Claude Code, Codex, and Gemini CLI, leveraging open-source tools like OpenTelemetry, Grafana, and bash hooks. The primary goal of the project is to deliver insights into various aspects including costs, tool usage, operations, and quality with minimal dependencies. A notable feature is its quick setup capability; it enables users to deploy six services and eight dashboards in under a minute using a single command. The solution supports multiple AI CLIs through both native OpenTelemetry integration and custom bash hooks, enhancing telemetry capabilities. Users can access comprehensive dashboards that offer both unified cross-provider views and detailed per-provider analyses, covering metrics such as costs, tool usage, operations, quality, and session timelines. The platform is designed to function entirely offline on a local machine without requiring any cloud account, highlighting its self-sufficiency. The setup process involves prerequisites like Docker with Compose v2, curl, jq, and an AI CLI installation. Users can clone the repository and execute initialization scripts to launch the stack and embed telemetry hooks into their CLI configurations. Real-time data visualization is accessible through dashboards on `localhost:3000`. Architecturally, "The Eye" employs Grafana for dashboarding, Prometheus for metrics and alerts, Loki for log aggregation, and Tempo for distributed tracing. It includes an Alertmanager configured with 15 alert rules across infrastructure, pipeline, and business logic tiers to ensure robust monitoring. Contributions to the project are welcome, requiring contributors to run a test pipeline before submitting changes. The software is available under the Elastic License 2.0, which permits free use, modification, and distribution but prohibits hosting or offering managed services. Overall, "The Eye" stands out for its comprehensive observability features and ease of deployment in self-hosted environments for AI coding assistants. Keywords: #phi4, AI, CLI, Docker, Elastic License, Git context, Grafana, Loki, OTel, OpenTelemetry, Prometheus, Self-hosted, Shepard System, Tempo, alerting, alerts, architecture, bash hooks, containers, dashboards, logs, metrics, observability, telemetry, traces

gemini cli

github.com 9 days ago
https://digitalshepard.ai/articles/the-eye-part2/ 9 days ago

2063. HN WarpSpeed automatically rewrites Nvidia core library, achieves 3.6-100x speedup

WarpSpeed is an advanced AI system developed by doubleAI that enhances NVIDIA's cuGraph library by delivering hyperoptimized graph analytics algorithms without necessitating code changes from users. It leverages performance engineering techniques to achieve significant speed improvements, with 55% of the algorithms achieving over twice their original speeds and some exceeding tenfold gains. This is accomplished through specialized kernel generation tailored for each algorithm configuration, addressing the irregularities unique to graph processing compared to dense workloads like matrix multiplication. WarpSpeed's edge comes from its ability to identify optimizations that surpass human expertise by systematically applying improvements across all configurations and hardware targets. A critical component of WarpSpeed's success is its robust verification framework, which independently ensures correctness despite challenges such as non-determinism in graph algorithms. This capability outperforms other AI coding agents like Claude Code, Codex, and Gemini CLI, producing accurate implementations for every tested algorithm due to advanced verification methods that mitigate risks like incorrect optimizations or reward hacking. WarpSpeed's optimization engine uniquely employs a "time-travel" approach, enabling it to explore various optimization strategies while retaining insights from past attempts. The system scales effectively across thousands of GPUs in a distributed signals environment, allowing for extensive evaluations and training processes. With the release of doubleGraph, users can seamlessly integrate these optimizations into their existing workflows using cuGraph 26.02.00 as a drop-in replacement. This innovation supports doubleAI's vision to create AI systems that outperform human experts in specialized domains, fostering future advancements in personalized software development. Keywords: #phi4, CUDA, GPU-accelerated, Nvidia, WarpSpeed, algorithms, all-pairs cosine similarity, artificial intelligence, cuGraph, doubleAI, expert systems, graph analytics, kernels, lock-free CUDA, optimization, performance engineering, reinforcement learning, speedup, vertical integration, weakly connected components

gemini cli

www.doubleai.com 9 days ago

2130. HN Show HN: Homebutler – Manage multiple servers from chat, single binary

HomeButler is an innovative tool designed for efficient homelab management across multiple interfaces like chat applications or command-line tools. It provides comprehensive functionalities such as server monitoring, Docker container control, remote machine waking, and network scanning, all within a single binary without dependencies. The architecture of HomeButler comprises three layers: the core Tool Layer, the AI Agent Layer for integrating with AI tools to execute commands, and the Chat Interface Layer supporting platforms like Telegram and Slack. Users can choose from CLI, MCP server, or Web dashboard interfaces, which interact seamlessly with internal packages, ensuring a consistent experience without code duplication. The tool offers several key features: a dark-themed web dashboard for monitoring various system aspects, a terminal-based TUI Dashboard for real-time updates every two seconds, and robust system & network management capabilities including status checks, port scanning, and alerts. Installation is straightforward via Homebrew on macOS/Linux or through npm for MCP server functionality, with support for direct installation from source using Go. HomeButler caters to various usage scenarios, such as AI-powered management where natural language commands control servers and containers, and zero downtime management facilitating remote operations without physical SSH access. The tool prioritizes security by avoiding network listeners in default modes and recommending key-based authentication over passwords for secure server communication. Overall, HomeButler streamlines homelab management with flexible integrations and automated infrastructure monitoring and control capabilities. Keywords: #phi4, AI ChatOps, CLI, Docker, Go binary, HomeButler, JSON output, MCP server, SSH, TUI Dashboard, Wake-on-LAN, alerts, configuration, homelab, installation, multi-server management, network scan, servers, web dashboard

gemini cli

github.com 9 days ago

2133. HN Got suspended while using headless mode with custom system prompt

A user experienced account suspension while utilizing Gemini CLI in headless mode with a custom system prompt, identified as issue #20632. The suspension occurred due to purported violations of the Terms of Service concerning the use of third-party software. Although the user believed their actions were within permissible boundaries based on documented features, they submitted an appeal but encountered constraints when trying to provide more detailed explanations via the form. Consequently, the user is seeking clarification regarding what specifically constitutes a violation related to "third party coding agent" usage. Keywords: #phi4, API, Account Suspended, Antigravity, Appeal Form, Automation, Code Assist, Cron Job, Documentation, Gemini CLI, Google Docs, Headless Mode, OAuth, OpenClaw, System Prompt Override, Terms of Service, Third Party Software, Violation

gemini cli

github.com 9 days ago

2178. HN Show HN: Workz–Git worktrees with zero-config dep sync and a built-in MCP server

"Workz" is an innovative tool designed to streamline the use of Git worktrees, addressing common challenges such as managing missing `.env` files and avoiding redundant dependency installations like `node_modules`. It automates several tasks to enhance efficiency: auto-syncing by symlinking directories (e.g., `node_modules`, `target`) and copying environment files into new worktrees helps save disk space. Additionally, its fuzzy switching feature provides a TUI for intuitive navigation between worktrees, integrating seamlessly with the shell in a manner similar to zoxide. The MCP Server allows AI agents such as Claude Code or Cursor to autonomously handle worktrees without human input. Crafted in Rust, Workz is a single executable requiring no configuration for projects using Node, Rust, Python, Go, and Java, and can be installed via Cargo or Homebrew. It boasts numerous features: it symlinks heavy directories, copies environment files, synchronizes IDE configurations, smartly detects relevant project directories to sync, and auto-installs dependencies identified from lockfiles. Its fuzzy TUI enables easy navigation of worktrees, while a comprehensive status dashboard provides vital information like branch details and disk size. Docker support includes automatic starting and stopping features. Additionally, it integrates with AI tools such as Claude Code and Cursor. Workz supports both global and project-specific configurations and ensures safe defaults to prevent file overwrites or the forceful deletion of unsaved worktrees. By simplifying Git worktree management across various projects, Workz provides a seamless workflow for users seeking enhanced efficiency in their development processes. Keywords: #phi4, AI agents, Docker support, Git worktrees, Go, Java, MCP server, Nodejs, Python, Rust, auto-install dependencies, dependency syncing, env files, fuzzy switching, global config, project detection, rich status dashboard, shell integration, single binary, symlink directories, zero-config

gemini cli

github.com 9 days ago

2189. HN AI Scientist v3: Scale from 1-hour to 24 hours with Reviewer agent

AI Scientist v3 is an enhanced autonomous research system designed to streamline and expand upon its predecessor by enabling self-orchestration through natural language processing and advanced agent-native capabilities, as introduced in March 2026. The system transitions from the rigid orchestration of AI Scientist v2 to a flexible model that allows agents like Claude to autonomously manage workflows without predefined scripts. This is achieved by utilizing conversation history as a dynamic search tree. Key features include significant reductions in orchestration code, with about 5,000 lines replaced by a concise CLAUDE.md file and a single literature search skill, enabling native execution of tasks such as experiment design and academic writing through structured workspaces and specialized database querying skills. Job management is facilitated via scripts that initiate Docker containers for CPU or GPU environments, allowing jobs to resume using prior artifacts and human feedback. A comprehensive reviewer agent evaluates the entire research process, assessing code quality, experiment tracking, and statistical rigor beyond paper content. Research outcomes are version-controlled in GitLab repositories, supporting comparisons across different runs and iterations of agents. The system underscores minimalistic skill design by removing unnecessary instructions to reduce noise and highlights a plateau in reviewer feedback as an area for potential improvement. Future directions emphasize the development of stronger reviewer agents through reinforcement learning and cross-agent tracepollination to address feedback limitations and enhance agent autonomy in novel idea generation. Over 15 research ideas have been explored across eight domains, showcasing AI Scientist v3's capacity for driving scientific innovation. Keywords: #phi4, AI Scientist, Docker, Git, GitLab, agents, artifact layer, experiments, feedback loop, literature search, orchestration, research ideas, reviewer agent, tool calls, trajectory

gemini cli

huggingface.co 9 days ago

2191. HN Show HN: Webflow Skills by 224 Industries

Webflow Skills by 224 Industries provides agent skills tailored for AI models such as OpenAI Codex, Claude Code, Gemini CLI, and Cursor. These skills are structured as folders that include instructions, reference documents, and scripts to enable AI systems to perform tasks accurately without relying on guesswork or producing generic outputs. Each skill comes with a SKILL.md file that specifies its purpose and how it should be used. This format, originally developed by Anthropic for Claude, has evolved into an open standard adopted across various AI platforms. Additionally, partners including Canva, Notion, Figma, and Atlassian have developed their own skills using this standardized approach to enhance the functionality of their respective tools through guided AI operations. Keywords: #phi4, AI, Agent, Atlassian, Canva, Claude Code, Cursor, Docs, Figma, Gemini CLI, Industries, Instructions, Notion, OpenAI Codex, Scripts, Skills, Webflow

gemini cli

224industries.com.au 9 days ago

2234. HN Show HN: Agentchattr – local chat room for Claude Code / Codex / Gemini CLI

Agentchattr is a local chat server designed to facilitate real-time coordination between AI coding agents—such as Claude Code, Codex, or Gemini CLI—and humans by providing a unified chat interface. This tool effectively addresses the inefficiencies associated with using multiple agent command-line interfaces (CLIs) by allowing seamless interaction within a single shared UI and eliminating manual copy-pasting or context switching. Key features of Agentchattr include support for automatic agent responses triggered via @mentions, which simplifies user-agent interactions. It hosts a browser-based chat interface connected through WebSocket, with message persistence enabled using JSON lines (JSONL). The server is cross-platform, supporting Windows, macOS, and Linux, utilizing Win32 console API or tmux to inject commands into agent terminals on respective systems. Additionally, it features activity tracking by monitoring terminal screen buffers to indicate when agents are busy. Conversations within Agentchattr are organized into channels similar to Slack, with support for lightweight project memory that aids in decision-making processes aligned with human approvals. The platform enhances usability with functionalities like pinned messages, message deletion, notifications, voice typing, image sharing, and entertaining slash commands such as art challenges or poetry creation. Technically, Agentchattr requires Python 3.11+ and at least one CLI-based AI agent to function. It utilizes a local server setup with configurable ports for its web UI (8300) and Multi-Agent Programming Command (MCP) transport layers, which include HTTP on port 8200 and Server-Sent Events (SSE) on port 8201. Quickstart scripts are provided to streamline environment setup and service initiation. Security measures within Agentchattr encompass the use of session tokens and origin checking to ensure secure local use. The platform mitigates shell injection vulnerabilities by executing subprocesses directly without `shell=True` and issues warnings for network binding configurations that could expose the server beyond localhost. As an open-source project, Agentchattr aims to boost productivity through enhanced coordination between human developers and AI agents. Keywords: #phi4, @mention, AI agents, CLI, FastAPI, MCP, WebSocket, Windows API, activity monitoring, cross-platform, local chat, loop guard, session token, tmux

gemini cli

github.com 9 days ago

2255. HN 3D dashboard to monitor and control your AI coding agents in real-time

The AI Agent Session Center provides a sophisticated real-time 3D dashboard tailored to manage multiple AI coding agents such as Claude Code, Gemini CLI, and Codex from a single interface. This dashboard offers an interactive visual experience where each coding session is depicted by an animated robot within a cyberdrome setting; the robots' actions indicate their respective sessions’ statuses, including command execution, input prompting, or awaiting user approval. Key features of this dashboard include 3D visualization for session representation, simultaneous multi-CLI support across various AI agents, and direct SSH terminal management. It also introduces a dynamic room system to categorize sessions into themed environments like rooms or lounges. Users benefit from functionalities such as prompt queue management with drag-and-drop options and approval alerts for tools requiring user consent. Additionally, the system allows session resumption upon disconnection and offers customizable themes along with a sound system featuring synthesized tones and ambient presets. The dashboard also provides usage analytics to track interactions. Running on any device using Node.js 18+, it supports diverse AI CLIs through bash hooks that facilitate data collection without modifying CLI applications. Access is available via a web interface, typically at `http://localhost:3333`, with customizable port settings. The technical infrastructure of the system includes technologies like Node.js, Express, WebSocket, React with TypeScript, Three.js for 3D rendering, and SQLite for database management. The session matching employs a priority-based system to associate hook events accurately with sessions, although it is more effective on macOS/Linux than Windows. For installation, users can initiate the dashboard using `npx ai-agent-session-center` or install it globally via npm, configuring necessary hooks for data collection. Looking forward, the project roadmap encourages contributions aimed at enhancing features such as additional CLI integrations, remote monitoring capabilities, agent creation templates, collaboration tools, mobile support, plugin systems, and community-driven themes. For troubleshooting, users can verify hook registration and address port conflicts. Open to community contributions under the MIT License, detailed guidelines are available in its documentation to assist contributors. Keywords: #phi4, 3D dashboard, AI coding agents, CLI integrations, Nodejs, PWA, PowerShell, React, SQLite, SSH terminals, Threejs, WebSocket, Zustand, animated robots, approval alerts, bash hooks, collaboration, cyberdrome, macOS/Linux, multi-CLI support, plugin system, plugin system3D dashboard, plugin systemComma-separated Keywords: 3D dashboard, plugin systemExtracted Keywords: 3D dashboard, plugin systemFinal Keywords: 3D dashboard, plugin systemFinal List: 3D dashboard, plugin systemKeywords: 3D dashboard, plugin systemSelected Keywords: 3D dashboard, prompt queue, real-time monitoring, remote monitoring, session center, team visualization, xtermjs

gemini cli

github.com 10 days ago

2296. HN Show HN: Glass box governance for multi-agent AI coding workflows

VNX is an innovative open-source tool designed to orchestrate multi-agent AI workflows within terminal environments, developed by Vincent van Deth. It utilizes a "glass box" governance model for effective management of coding tasks among various AI agents such as Claude Code and Codex CLI using parallel tmux panes. The system offers real-time status tracking, an append-only ledger for task receipts, and context rotation to seamlessly handle long-running processes without interruption. To install VNX on macOS, essential prerequisites include `tmux`, `bash`, `python3`, and `git`, with optional tools like `jq` and `fswatch`. The setup process involves cloning the repository, integrating it within a project, and initializing the system. Orchestration is executed in a 2x2 tmux grid: one pane (T0) acts as an orchestrator managing tasks, while other panes host different AI agents. VNX supports configuration of multi-provider profiles, allowing users to select specific agent combinations through interactive menus or command-line options. It offers governance features such as quality reviews and evidence-based decisions for task approvals or re-dispatches. The tool emphasizes local data storage on the filesystem without dependence on databases or cloud services. The system includes commands for initialization, validation, session launching, cost reporting, updates, and handling AI skills. A context rotation mechanism is integrated to automatically manage session continuities when agents reach their context limits, reducing the need for manual intervention. VNX aims to enhance coordination in multi-agent workflows with robust governance features, improving reliability and practicality in terminal-based coding environments. The project encourages contributions and discussions via GitHub and operates under an MIT license, with further development insights available on Vincent van Deth's blog. Keywords: #phi4, AI coding agents, CI/CD, CLI, GitHub Actions, Glass box governance, MIT license, MIT license Keywords: Glass box governance, NDJSON ledger, Rust/Go engine, VNX, Vincent van Deth, bash, context rotation, context window, dispatch queue, evidence-based review, git, multi-agent AI, orchestration toolkit, provider profiles, python3, quality gates, receipt ledger, security, terminal workflows, tmux

gemini cli

github.com 10 days ago
https://github.com/Vinix24/vnx-orchestration.git 10 days ago

2319. HN Show HN: I'm building a platform to manage larger projects with AI agents

Frame is an advanced project management and development platform designed to streamline workflows in large-scale projects through AI integration with tools such as Claude Code, Codex CLI, and Gemini CLI. Initially conceived as a minimalist IDE for terminal use, it has evolved into a versatile tool that supports multiple AI agents within a single interface, incorporates automatic context injection, and adheres to standardized project structures. The platform enhances productivity by integrating features like real-time bidirectional communication across over 115 IPC channels, built-in task tracking with AI capabilities, and seamless project switching. Key functionalities include a core capability for managing up to nine terminal sessions in a dynamic 3x3 grid layout, allowing users to efficiently navigate between projects. Its IDE layout is designed around three main panels: an explorer for file navigation, a terminal area for command execution, and a prompt history panel that logs commands with timestamps. Frame supports real terminals via node-pty and facilitates quick editing with overlay editors while providing a collapsible file tree view that excludes `node_modules`. The platform's project management tools enforce standardized structures through files like AGENTS.md and STRUCTURE.json, which preserve context across sessions and enable decision tracking. Contextual AI assistance is another standout feature, where Claude Code automatically identifies tasks from conversations, allowing users to manage tasks effortlessly. Frame encourages saving significant decisions in `PROJECT_NOTES.md`, further enhancing project documentation. Built on Electron 28 with a modular architecture optimized by esbuild for rapid bundling, Frame can be installed via cloning its repository, installing dependencies through npm, and executing it from the command line. The development philosophy emphasizes reducing workflow friction in expanding projects by integrating essential tools into a unified interface, thereby promoting productivity. Frame, although primarily a personal project, invites community contributions and engagement. Developers interested in contributing can fork the repository, create feature branches, commit changes, push to their branches, and submit pull requests. The platform is open-source and distributed under the MIT License, fostering collaboration and innovation within its user base. Keywords: #phi4, AGENTSmd, AI agents, Claude Code, Codex CLI, Electron, Frame, Gemini CLI, Git integration, GitHub integration, IDE, IPC channels, PROJECT_NOTESmd, PTY, STRUCTUREjson, WebSocket migration, context injection, cross-platform, esbuild, extensions/plugins, file editor, modular architecture, multi-AI support, multi-terminal, plugin system, project management, prompt history, task tracking, tasksjson, terminal-first, theme customization, xtermjs

gemini cli

github.com 10 days ago

2339. HN Piloting Claude and Gemini on Debian from Signal

The author narrates their journey in enhancing the development environment for nocodefunctions.com by employing Debian servers to integrate Claude Code and Gemini CLI tools, aimed at boosting productivity. Initially content with an SSH-based setup accessible from various devices, they encountered limitations such as restricted mobile terminal usage and token rate caps imposed by Claude, which led to frustration. To overcome these challenges, the author integrated the Gemini CLI to take advantage of its subscription allowances, facilitating better coordination between Claude and Gemini through shared markdown files for efficient task management. Furthermore, they improved user interaction by incorporating Signal's CLI, allowing direct communication with development agents, thus offering an alternative to traditional IDEs and ensuring a comfortable experience even on mobile devices. These enhancements enabled the author to develop new functionalities, such as creating social graphs from PDFs or web pages, demonstrating increased productivity without incurring additional costs beyond existing subscriptions. The author concludes by inviting feedback on these improvements and expresses enthusiasm for future projects, indicating an ongoing commitment to evolving their development environment. Keywords: #phi4, CLI agents, Claude Code, ConnectBot, Debian, Gemini CLI, OpenClaw, Python lib, SSH, Signal, Telegram, nocode functions, productivity, social graph, token limits, web interface

gemini cli

nocodefunctions.com 10 days ago

2371. HN OpenSandbox

OpenSandbox is an advanced platform designed to facilitate a range of AI applications through robust tools like multi-language SDKs, unified sandbox APIs, and runtime environments using Docker/Kubernetes. It caters to diverse use cases such as Coding Agents, GUI Agents, Agent Evaluation, AI Code Execution, and RL Training by providing a flexible and scalable environment for developers. The platform features multi-language SDKs supporting Python, Java/Kotlin, JavaScript/TypeScript, C#/.NET, with plans to include Go in the future. It employs a Sandbox Protocol that outlines APIs crucial for lifecycle management and execution of custom sandbox runtimes. OpenSandbox's runtime capabilities are built on Docker and Kubernetes, allowing both local and distributed scheduling, which is essential for deploying complex applications. The platform supports various environments like Command, Filesystem, and Code Interpreter implementations, which can be used for developing Coding Agents such as Claude Code, browser automation tools like Chrome and Playwright, and desktop environments including VNC and VS Code. Network policy management includes a unified Ingress Gateway with strategic routing options and per-sandbox egress controls to ensure secure and efficient network communication. The platform showcases its versatility through examples of basic operations such as server setup, command execution, file management, coding agent integrations for platforms like Google Gemini CLI, browser automation using Headless Chromium and Playwright, and ML training scenarios exemplified by DQN CartPole in a sandbox environment. The project structure encompasses multi-language SDKs, OpenAPI specs, lifecycle server components, deployment scripts, and thorough documentation, all available under the Apache 2.0 License. Looking forward, OpenSandbox has planned enhancements like developing a Go client SDK, introducing mountable persistent storage options, improving Kubernetes provisioning strategies, and creating lightweight sandbox solutions for local AI tool execution. Further details on architecture, examples, and project updates can be accessed through its documentation and GitHub repository. Keywords: #phi4, AI Code Execution, AI applications, Agent Evaluation, Apache 20 License, C#/NET, Coding Agents, Docker, Documentation, Environments, GUI Agents, Go, Helm Charts, Ingress Gateway, Java/Kotlin, JavaScript/TypeScript, Kubernetes, Lifecycle Management, ML Training, Network Policy, OpenSandbox, Persistent Storage, Python, RL Training, Roadmap, Runtime, SDKs, Sandbox Protocol

gemini cli

github.com 10 days ago

2402. HN Show HN: OpenGem – Free, self-healing load-balanced proxy for Google Gemini API

OpenGem is an open-source, free-load balanced proxy designed to facilitate unrestricted access to Google's Gemini API by implementing a self-healing mechanism that distributes requests across multiple Google accounts, thus avoiding individual account quota limitations. It serves as a robust tool for developers focusing on efficient prototyping and supports compatibility with official Google SDKs alongside features like function calling for AI agents and real-time streaming capabilities. OpenGem ensures data security using AES-256-GCM encryption and offers flexible database options such as Firebase Firestore or Local JSON. The project is constructed in TypeScript, leveraging OAuth for authentication, with strong security measures including JWT tokens and bcrypt hashing, complemented by comprehensive HTTP security headers. Its intelligent self-healing functionality helps prevent account exhaustion by distributing API requests evenly across different accounts. For setup, users require Node.js v18 or higher, a Google account, and optionally a Firebase project if Firestore is used for data storage. OpenGem includes an admin dashboard for managing accounts, API keys, and tracking request logs, providing detailed insights into its operation. Although primarily intended for educational and prototyping purposes with a clear disclaimer on non-commercial use and liability limitations regarding Google's terms compliance, OpenGem encourages community contributions under the MIT License. This ensures that developers can freely modify and improve upon the project while adhering to specified guidelines. Keywords: #phi4, AES-256-GCM encryption, Firebase Firestore, GitHub repository, Google Gemini API, Nodejs, OAuth authentication, OpenGem, TypeScript, exponential backoff, multi-account rotation, proxy, rate limiting, self-healing accounts

gemini cli

github.com 10 days ago

2420. HN Show HN: Claude-plan-reviewer – Rival AI reviews Claude Code's plans

The "Claude-plan-reviewer" tool is designed to improve the quality of implementation plans produced by Claude Code in its planning mode prior to coding, by incorporating an adversarial review process. It achieves this by intercepting these plans and sending them for evaluation to competing AI systems like Codex CLI or Gemini CLI, thereby leveraging external perspectives to identify potential blind spots that a single model might overlook. Upon exiting the planning phase, Claude is subject to a `PreToolUse` hook which reviews the submitted plan; if it does not meet approval criteria, permission is denied and feedback is provided, prompting Claude to revise and resubmit its plan. This iterative review process typically involves two rounds by default, emphasizing the significance of diverse evaluations in refining plans. The tool stands out for its simplicity and efficiency, consisting of approximately 400 lines of JavaScript code without any dependencies. It can be easily installed using npm on systems running Node.js version 18 or higher, making it accessible to a wide range of users interested in improving AI-generated implementation plans. Additionally, "Claude-plan-reviewer" is open-sourced under the MIT license, allowing for collaborative enhancement and widespread use within the developer community. The repository hosting this tool can be found on GitHub, providing further resources for those interested in its application and development. Keywords: #phi4, AI, Claude Code, Codex CLI, ExitPlanMode, Gemini CLI, GitHub, JavaScript, MIT licensed, Nodejs 18+, PreToolUse hook, adversarial review, diff, feedback, implementation plans, npm install, permission decision, plan mode, rounds, setup, tool

gemini cli

news.ycombinator.com 11 days ago

2431. HN Show HN: Cc-connect – Remote control Claude Code from your favorite chat app

Cc-connect is an innovative tool designed to integrate local AI coding assistants, such as Claude Code, with popular messaging platforms like Feishu, DingTalk, Slack, and more. This integration allows users to interact with their AI agents seamlessly from any location without the need for a public IP address on most platforms. The system's architecture is composed of three primary components: the Platform, which adapts various messaging protocols; the Agent, responsible for connecting local AI tools and handling responses; and the Engine, which serves as the core router managing sessions and message routing. Each component operates independently through Go interfaces, ensuring a pluggable and extensible design. Cc-connect supports several agents and platforms, with Claude Code currently integrated and plans to include others like Cursor, Gemini CLI, and Codex in the future. Supported platforms—Feishu via WebSocket, DingTalk via Stream, Telegram using Long Polling, Slack through Socket Mode, Discord with Gateway, and LINE along with WeChat Work requiring a Webhook—highlight its versatility. Additional support for platforms such as WhatsApp, Microsoft Teams, Google Chat, Mattermost, and Matrix is planned. For users to start utilizing Cc-connect, prerequisites include setting up the Claude Code CLI, with options for automated or manual installation. The tool offers four distinct permission modes for managing interactions: Default, requiring user approval; AcceptEdits/Plan, allowing automatic file edits; and YOLO, which auto-approves all actions in trusted environments. Sessions are managed independently per user, providing full conversation context through slash commands that facilitate permission requests and grants. Extensibility is a key feature of Cc-connect, enabling the addition of new platforms or agents by implementing specific interfaces (`core.Platform` for platforms and `core.Agent` for agents) and registering them within the system. This allows multiple projects to be managed concurrently within a single cc-connect process, with comprehensive documentation available for configuration and platform setup. The tool is distributed under an MIT license, ensuring open access and modification rights. Keywords: #phi4, AI coding assistants, Cc-connect, Chat app, Claude Code, Configuration, Core abstractions, Decoupled components, DingTalk, Extending, Feishu, Gateway, Go interfaces, Installation, Internationalization, Long Polling, Messaging platforms, Multi-session management, New agent, New platform, Permission modes, Platform setup, Plugin-style registry, Project configuration, Remote control, Routing engine, Session management, Slack, Slash commands, Socket Mode, Stream, WebSocket, Webhook

gemini cli

github.com 11 days ago
https://github.com/imprisonedmind/codex-discord-bridge? 11 days ago

2466. HN Show HN: The most awesome AI programming application desktop has emerged

Golutra emerges as an innovative AI programming application designed to transform traditional command-line interface (CLI) tools into a unified AI collaboration platform. Specifically tailored for users handling multiple projects, Golutra enhances productivity through parallel execution and automated orchestration without necessitating the migration of existing projects or relearning of commands. Its capabilities extend to unlimited multi-agent operations, managing tasks across various stages from analysis to deployment. It seamlessly integrates with several prominent CLI tools, including Claude Code, Gemini CLI, Codex CLI, OpenCode, and Qwen Code, while also offering a user-friendly visual interface that complements the command-line functionalities. Constructed using Vue 3 and Rust within the Tauri framework, Golutra supports both Windows and macOS environments. It addresses the inefficiencies associated with manual context switching in traditional integrated development environments (IDEs) by facilitating automated multi-agent execution and coordination. The platform is currently in its early developmental phases but plans for future enhancements include establishing a refactored OpenClaw as a central AI coordination core, introducing mobile remote control capabilities, developing an auto agent builder to generate industry-specific agents, and creating a unified agent interface with a deep memory layer to improve knowledge retention. Open-source under the Business Source License 1.1, Golutra permits commercial use of software developed using its framework. Its progression from AI squads to organized AI teams is poised to significantly boost collaboration efficiency in programming environments. Keywords: #phi4, AI coordination core, AI programming, CLI tools, agent construction, automated orchestration, command layer, desktop application, mobile remote control, multi-agent collaboration, parallel execution, real-time result tracking, stealth terminal

gemini cli

github.com 11 days ago

2468. HN Show HN: Salacia – The First Runtime OS for Agentic Coding

Salacia is a lightweight Runtime Operating System aimed at improving the reliability of prominent AI coding agents such as Cursor, Claude Code, and Cline by tackling their frequent issue of losing context during conversation transitions. It achieves this through several key features: compiling prompts into structured Intent Intermediate Representation (IR) with verifiable specifications, utilizing metamorphic testing to avoid semantic drift, implementing a risk-first strategy for critical questions, and keeping an auditable journal of all modifications. In extensive evaluations involving over 500 software engineering benchmark tasks across three cutting-edge models, Salacia demonstrated significant enhancements, including a 9 percentage point increase in pass rate and a 93% accuracy in fault localization. The system operates locally by default without intrusion and is compatible with various AI agents. It can be easily installed via a single command from its open-source repository on GitHub. Feedback is actively solicited, especially from users who have encountered issues related to prompt drift and context loss. Additional details about Salacia are available on its official website. Keywords: #phi4, AI coding agents, Agentic Coding, Antigravity, Auditable journal, Claude Code, Cline, Context loss, Cursor, Gemini CLI, GitHub, Intent IR, Metamorphic testing, Open source, Prompt drift, Prompts reliable, Risk-first gate, Runtime OS, SWE-bench tasks, Salacia, Semantic drift, Website

gemini cli

news.ycombinator.com 11 days ago

2521. HN Show HN: Polpo – Control Claude Code (and other agents) from your phone

Polpo is an open-source mobile application designed to enable developers to manage AI coding agents like Claude Code, Codex, Gemini, OpenCode, and Pi from their smartphones. Available in version 1.1.0, it provides a phone-friendly interface for controlling sessions, sending prompts, approving tool calls, and reviewing plans without the need to use a terminal. The app operates by running a lightweight server on the user's machine, ensuring seamless interaction via WebSockets for real-time updates and functioning over LAN or remote connections through tunneling tools such as cloudflared or ngrok. Key features of Polpo include support for multiple coding agents and skill management, allowing users to start new sessions conveniently. Built using Node.js, it offers a flexible architecture that supports various integration modes and automatically detects active sessions with the help of filesystem events. The app is designed to enhance multitasking by allowing developers to manage AI tasks on-the-go, such as during commuting or waiting periods. Security measures in Polpo include authentication tokens, PINs, and TOTP options, ensuring safe remote access for users. For setup, it requires development tools like Node.js and specific CLI installations on macOS or Linux platforms. Overall, Polpo emphasizes flexibility, real-time interaction, and ease of use across mobile devices, streamlining the process of managing AI coding tasks while providing robust security features. Keywords: #phi4, AI, AI coding agents, CLI, CLI integration, LAN, Nodejs, Polpo, WebSocket, authentication, coding, controller, mobile, mobile controller, multi-agent, multi-agent support, real-time, real-time updates Keywords: Polpo, session, session management, skillssh, tunneling

gemini cli

github.com 11 days ago

2589. HN Agent-md/session-commit: Update your AGENTS.md after each session

The `agent-md/session-commit` plugin is designed to streamline the process of maintaining an updated AGENTS.md file within a codebase, serving as a centralized repository for knowledge accessible to both human developers and AI agents. This tool captures insights gained during coding sessions across various platforms such as Claude Code, OpenCode, Codex CLI, and Gemini CLI, ensuring that best practices, patterns, and critical learnings are consistently reflected in AGENTS.md. A key advantage of this plugin is its tool-agnostic nature, which allows the knowledge captured to be universally available across different AI coding tools and shared with human collaborators. The installation process for integrating `agent-md/session-commit` is straightforward across Codex CLI, Claude Code, Gemini CLI, and OpenCode, as these platforms natively support AGENTS.md. Users can quickly implement the `/session-commit` command using curl commands or through built-in marketplace utilities, facilitating seamless setup and management. The plugin operates by capturing session learnings post-coding activities, updating AGENTS.md with proposed changes based on user confirmation, and creating pointer files like CLAUDE.md, GEMINI.md, and CODEX.md if they are absent. The primary benefits of using `agent-md/session-commit` include effective knowledge dissemination and simplified updates. By ensuring that the AGENTS.md file is accurate and current, teams can enhance collaboration by sharing best practices efficiently. The automation provided by this plugin allows for consistent documentation, keeping it aligned with evolving coding practices, which ultimately boosts overall efficiency in software development environments. Through systematic recording and distribution of session insights, `agent-md/session-commit` fosters an integrated approach to knowledge management across multiple tools, significantly enhancing collaborative efforts in the development process. Keywords: #phi4, AGENTSmd, AI coding tools, Claude Code, Codex CLI, Gemini CLI, OpenCode, best practices, development sessions, knowledge dissemination, markdown files, patterns, plugin, project structure, session-commit, tool-agnostic

gemini cli

github.com 11 days ago

2618. HN Zora Agent:local AI agent that can't be hijacked mid-task by context compaction

Zora Agent is a local AI assistant designed to operate directly on users' computers by executing tasks based on simple English commands. Unlike typical chatbots, Zora interacts with and manipulates files, documents, and other computer tools while adhering to predefined safety measures established during its setup. The software can be easily installed through npm or from the source, with a comprehensive guide available for beginners. It leverages AI capabilities (notably Claude or Gemini) to interpret user requests within specified safety constraints, performing tasks such as file organization and web searches. Zora emphasizes security by incorporating features like folder access controls, command restrictions, and an audit log to prevent common AI vulnerabilities. Users can monitor ongoing tasks through a local dashboard that provides real-time updates and checks on the status of AI providers. The system supports multiple AI providers with automatic failover capabilities, ensuring consistent performance. Additionally, users have the option to schedule recurring tasks, such as daily or weekly reports. Currently in active development at version 0.9.9, Zora is cross-platform compatible but primarily tested on macOS, with other platforms being developed. The system operates without the need for API keys or unexpected billing, relying instead on existing AI subscriptions. It invites contributions under an MIT License, promoting open-source collaboration. Keywords: #phi4, AI assistant, OWASP security, Zora Agent, audit log, code analysis, content generation, file organization, local agent, multi-provider support, multi-provider support Keywords: Zora Agent, safety features, scheduled tasks, task automation, web dashboard

gemini cli

github.com 11 days ago

2652. HN Let's Discuss Sandbox Isolation

Shayon Mukherjee's article delves into various sandbox isolation techniques for securely executing untrusted code, emphasizing the significance of selecting an appropriate model based on security needs, threat models, and performance requirements. The piece distinguishes between different isolation approaches by examining their boundaries, attack surfaces, and potential failure modes. The article begins by discussing how Linux kernel vulnerabilities affect standard containers due to their shared system call surface. It outlines the role of namespaces in isolating resources like process IDs and file systems but notes that they do not guard against kernel exploits, as evidenced by numerous CVEs targeting container runtimes such as runc. Cgroups are presented as tools for resource management rather than security, incapable of preventing code escape from containers. Seccomp-BPF is described as a syscall filter operating within the same kernel, thus failing to diminish the attack surface fundamentally. Running containers in privileged mode can further weaken isolation by granting broader system access. In contrast, gVisor enhances security through its user-space Sentry kernel, which mediates interactions between untrusted code and the host kernel, significantly reducing the attack surface compared to traditional containers. The article recommends employing additional layers like PID namespaces, seccomp filters, and network controls alongside gVisor for improved defense. MicroVMs are highlighted as offering robust hardware-backed isolation by running each workload in its own virtual machine with a separate kernel, although they incur higher overhead. WebAssembly (WASM) is presented as another technique that isolates code within a memory-safe environment lacking a syscall interface, relying on explicitly imported host functions for interactions. For local sandboxing solutions, the article mentions tools like Apple's Seatbelt and OpenAI's Codex CLI that use OS-level permissions to restrict untrusted code execution on developer machines without involving kernel boundaries. Throughout, Mukherjee emphasizes ongoing advancements in improving the efficiency and speed of securely isolated workloads, underlining the need for careful consideration when choosing isolation models tailored to specific security demands. Keywords: #phi4, AI Agents, Cgroups, Containerization Framework, Defense-in-Depth, Docker, Hardware Boundary, Hypervisor, Kernel, Local Sandboxing, MicroVMs, Namespace Escape, Namespaces, Network Egress Control, Permission Scoping, Reinforcement Learning, Sandbox Isolation, Seccomp-BPF, Security, Syscall Filter, User-Space Kernel, Virtualization, WebAssembly, gVisor

gemini cli

  www.shayon.dev 12 days ago
   https://peps.python.org/pep-0011/#tier-2   11 days ago
   https://github.com/brettcannon/cpython-wasi-build/   11 days ago
   https://tools.simonwillison.net/quickjs   11 days ago
   https://tools.simonwillison.net/microquickjs   11 days ago
   https://wasmer.io/posts/greenlet-support-python-wasm   11 days ago
   https://wasmer.io/posts/python-on-the-edge-powered-by-w   11 days ago
   https://github.com/webcoyote/sandvault   11 days ago
   https://github.com/Kiln-AI/Kilntainers   11 days ago
   https://docs.docker.com/ai/sandboxes/#why-use-dock   11 days ago
   https://github.com/jrz/container-shell   11 days ago
   https://github.com/noperator/cagent   11 days ago
   https://github.com/royalicing/qip   11 days ago
   https://github.com/karthink/gptel   11 days ago
   https://github.com/smol-machines/smolvm   11 days ago
   https://github.com/smol-machines/smolvm/discussion   11 days ago
   https://github.com/jgbrwn/vibebin   11 days ago
   https://multitui.com   11 days ago
   https://islo.dev   11 days ago
   https://github.com/buildkite/cleanroom   11 days ago

2677. HN Show HN: A CLI tool for agentic code review and auto-fixing

Ralph Review is an advanced command-line interface tool engineered to automate the code review and correction process utilizing artificial intelligence agents, ultimately aiming for enhanced code quality through iterative refinement cycles. It introduces an optional pre-review step that simplifies code, followed by a structured review cycle involving two distinct AI agents: a reviewer and a fixer. The reviewer agent assesses changes based on various metrics including correctness, security, and style, while the fixer independently verifies any identified issues before implementing corrections. This system allows for customizable assignments of different coding agents (such as Claude Code or Codex) to each role, providing flexibility in the review process. The iterative nature of Ralph Review ensures comprehensive evaluation until all code issues are resolved or a predefined iteration limit is reached, with Git checkpoints facilitating safe rollback capabilities if necessary. Additionally, it offers an array of commands for users to manage configurations, sessions, and diagnostics effectively. Key features include support for multiple coding agents, structured review outputs, independent fixer verification, and tmux integration for background operation. Ralph Review can be easily installed via npm and requires specific prerequisites like the Bun runtime and tmux. Configuration management is user-friendly, allowing for easy setup within a project directory. Under the MIT license, Ralph Review aims to significantly enhance code review workflows by combining automation with AI-driven insights for greater efficiency and effectiveness in maintaining high-quality code standards. Keywords: #phi4, AI agents, Bun runtime, CLI tool, Codex, Ralph Review, agentic review, auto-fixing, code simplifier, coding agents, configuration, git checkpoint, structured JSON, tmux sessions

gemini cli

github.com 12 days ago

2743. HN Show HN: DevSquad – Claude Code Plugin That Works with to Gemini CLI and Codex

DevSquad is an innovative Claude Code plugin designed to enhance AI-assisted coding by integrating with tools like Gemini CLI and Codex, addressing the challenges of token limits and context loss in larger projects. It enhances task management through automated delegation across specialized agents: Gemini handles research, Codex manages scaffolding and testing, while Claude focuses on synthesis tasks. Key features include hook-based delegation for dynamic task routing instead of static configurations, seamless integration with existing tools without additional setups like Docker, and a Usage Tracker to monitor token consumption efficiently. The plugin's development was led by an individual not primarily from a developer background, showcasing the growing influence of AI tools in software creation. DevSquad can be installed via Git or marketplace registration and features slash commands for various functions such as setup, configuration checks, and workflow execution. Its architecture consists of hooks for runtime enforcement, shared libraries, and specific skills like code generation. Despite its advanced capabilities, current limitations involve keyword-based routing and reliance on external tools like jq in strict mode. Future plans include addressing these issues with updates like a cleanup workflow in version 2.1. Overall, DevSquad aims to streamline AI-assisted coding by delegating tasks efficiently across specialized models, minimizing context loss, and boosting productivity without necessitating extensive setup changes. Keywords: #phi4, AI coding tools, Claude Code, Codex, DevSquad, Gemini CLI, agent delegation, context rot, enforcement modes, hook-enforced, plugin, token limits, tool integration, workflow orchestration

gemini cli

github.com 12 days ago

2894. HN Show HN: EloPhanto – Video creation, 116 tools

EloPhanto is an innovative open-source AI agent designed to operate locally on users' machines with comprehensive system access. It enhances its functionality by autonomously generating new tools when faced with tasks beyond its current capabilities, showcasing a remarkable ability for self-improvement through learning from user corrections and iteratively refining its skills. The agent supports video creation via Remotion, enabling the production of 1080p videos featuring physics-based animations, 3D scenes, data visualizations, and transitions. Additionally, it can handle email tasks by sending emails with up to 25MB attachments. With an extensive toolkit comprising over 116 tools, EloPhanto's capabilities include seamless integration of new functionalities like Remotion. Its autonomous nature allows the AI agent not only to manage routine operations but also to cultivate a digital identity and pursue objectives such as expanding social media presence, developing portfolios, earning via crypto wallets, and maintaining web presences. The system can function independently, optimizing tasks during idle times. EloPhanto excels in AI development by managing dev teams to deploy other AI agents for specific tasks like bug fixing or feature enhancement. It automates complex web operations using the user's genuine Chrome profile, handling intricate workflows including two-factor authentication and navigation challenges. Moreover, it can create full-fledged applications from specifications utilizing modern technologies such as Next.js and Prisma. The agent is accessible through a comprehensive web dashboard and supports various interaction channels like CLI, Telegram, Discord, and Slack. It prioritizes security with features that detect personally identifiable information (PII), guard against injection attacks, and ensure provider transparency. Users can easily start by cloning the GitHub repository and configuring it with setup scripts, allowing them to explore its extensive capabilities via command-line interfaces or web dashboards. EloPhanto is compatible with local AI models and diverse coding plans, making it an efficient tool for automation and development that meets a broad range of user needs while maintaining security and efficiency. Keywords: #phi4, AI, AI agent, EloPhanto, Remotion, WebSocket, WebSocket gateway Keywords: EloPhanto, automation, autonomous, browser, browser automation, crypto, crypto wallet, development, development pipeline, email, gateway, open-source, pipeline, presence, self-improvement, skills, video, video creation, wallet, web, web presence

gemini cli

github.com 13 days ago

2930. HN Weird System Prompt Artefacts

The article delves into "weird system prompt artifacts," focusing on the evolution of corrective instructions within model system prompts designed to address undesirable behaviors. These patches are likened to codebase hacks for edge cases, often undocumented, leading to speculation about their purposes. The exploration includes specific coding agents like Claude Code and Cursor, where peculiar patches target issues such as link generation tendencies, context distraction, verbosity preferences, identity confusion, and tool-specific instructions. For Claude Code, there's an instruction against generating URLs, suggesting persistent "link hallucination" problems despite having web search capabilities. In the case of Cursor, guidelines prohibit certain markdown headings or excessive comments, shaped by user feedback. Directives for high-verbosity code and specific tool use heuristics reflect engineering decisions aimed at enhancing correctness and usability. Models like Gemini CLI and OpenHands have explicit instructions to manage token consumption efficiently, indicating a design focus on resource usage awareness. The contrast between Codex CLI and Gemini CLI regarding test integration reveals differing philosophies toward ensuring code quality assurance. These artifacts highlight model biases and engineering priorities shaped by user interaction patterns and operational constraints, showcasing efforts to balance usability improvements with risk mitigation strategies. Keywords: #phi4, System prompts, URL generation, anti-comment, concurrency control, concurrency control Keywords: system prompts, context distraction, context-distraction, corrective instructions, identity strings, legacy prompt, link hallucination, markdown etiquette, model behavior, validation, verbosity

gemini cli

blog.nilenso.com 13 days ago

2939. HN Why Developers Keep Choosing Claude over Every Other AI

The article explores why developers consistently prefer Claude over other AI coding tools like Codex and Gemini, emphasizing Claude's reliability in real-world applications despite newer models often performing better in isolated benchmarks. Benchmarks typically assess problem-solving abilities without fully capturing the complexity of actual development work, which involves sustained workflow management, conversation handling, targeted edits, and error correction. Claude excels not through raw intelligence but via "process discipline," enabling it to perform multi-step tasks consistently and accurately without constant supervision. Other models may generate high-quality code for specific problems but often require frequent user intervention in interactive workflows. The article attributes Google's general-use AI approach as a limitation for optimizing software development tools, while Anthropic's Claude is specifically tailored for coding tasks, making it more dependable for developers who need reliable assistance without oversight. Although the performance gap may diminish as other models enhance their process discipline, Claude currently maintains an edge due to its specialized training in real-world coding workflows. Developers are advised to consider practical utility over benchmark results when selecting AI coding assistants, ensuring tools meet the nuanced demands of software development. Keywords: #phi4, AI, AI coding tools, Anthropic, Claude, Codex, Gemini, Google, agentic workflows, benchmarks, coding tools, developer experience, file editing, model training, multi-step tasks, process discipline, software engineering, task consistency, task consistency Keywords: Claude, task consistencyComma-separated list:Claude, tool reliability, workflow

gemini cli

www.bhusalmanish.com.np 13 days ago
https://archive.org/details/1950-Tide-Detergent-Ad 13 days ago

3009. HN Turn pull requests into guided walkthroughs

Gnosis enhances code review processes by transforming pull requests into guided walkthroughs, offering comprehensive insights beyond mere file differences. It organizes code changes thematically within an ordered slideshow format that begins with foundational elements before moving to implementation and testing phases, enriched with explanatory content like diagrams and contextual information. Key features of Gnosis include thematic grouping and dependency-based ordering of modifications, support for multiple AI models (Claude and Gemini), customizable review instructions focusing on areas such as security, and the ability to add inline comments directly on GitHub. Users can also toggle diff views and participate in slide-based discussions while leveraging web research capabilities integrated with GitHub's context via MCP. The tool filters out inconsequential changes and emphasizes significant design decisions. Gnosis operates locally using either Claude or Gemini CLI tools, supporting background review generation and cross-platform functionality. Installation requires at least one of these CLIs to be installed and authenticated, which can be done through Homebrew or by manually downloading from GitHub Releases. For developers looking to contribute or customize Gnosis, the development environment prerequisites include setting up a devbox and direnv with specific commands for cloning the repository, activating the environment, installing dependencies, and starting an Electron development server as detailed in the CONTRIBUTING.md file. Keywords: #phi4, AI review, CLI, GitHub, Gnosis, Linux, Windows, code reviews, comments, cross-platform, development guide, diff analysis, instructions, macOS, multi-provider, pull requests, risk assessment, thinking, walkthroughs

gemini cli

github.com 13 days ago

3056. HN SkillsBench: The First Benchmark for Agent Skills

SkillsBench is an innovative benchmark framework developed to assess the influence of structured procedural knowledge packages, known as Agent Skills, on AI agent performance across 86 real-world tasks. It evaluates how these skills enhance AI capabilities within various domains and configurations, showing that their effectiveness is highly context-dependent. The evaluation process involves paired comparisons, analyzing task performance with and without the integration of skills to directly measure efficacy. The findings indicate significant variability in performance enhancements across different domains: Healthcare experiences notable improvements, whereas Software Engineering sees minimal benefits. This variation underscores the critical role of domain-specific procedural knowledge, which pre-trained models often lack. The framework's assessment included 7 agent-model configurations spanning 11 domains and demonstrated that skills characterized by concise, executable code examples are most effective. Interestingly, smaller AI models enhanced with appropriate skills can surpass larger models without such augmentations, highlighting potential cost efficiencies in model design. Despite this, the automatic generation of effective skills by models is currently unreliable, emphasizing the necessity for human expertise in crafting these skills. SkillsBench advocates for context-specific application and continuous evaluation to maximize benefits. The framework offers a comprehensive resource that includes task registries, leaderboards, and detailed documentation, all available open-source for community contributions and further exploration. This initiative aims to facilitate ongoing development and refinement of AI agent performance through structured skill integration. Keywords: #phi4, AI agents, Agent Skills, Docker container, SkillsBench, agent-model configurations, benchmark, domains, evaluation framework, paired evaluation protocol, performance improvement, procedural knowledge, real-world tasks

gemini cli

www.skillsbench.ai 13 days ago

3057. HN Show HN: Oh-My-OpenClaw – agent orchestration for coding, from Discord/Telegram

Oh-My-OpenClaw (OmOC) is a sophisticated plugin that enhances OpenClaw, an AI agent framework for chat platforms, by integrating with Oh-My-OpenCode (OmO). This integration facilitates the orchestration of coding tasks across messaging platforms like Discord and Telegram, offering several advantages over terminal-only solutions. OmOC introduces an asynchronous workflow, allowing users to manage coding tasks from any device without needing continuous terminal access. It employs a multi-agent system consisting of 11 specialized agents—each with distinct roles such as planning, orchestration, implementation, architecture, and review—to efficiently delegate and execute tasks. The plugin features automatic model routing that directs tasks to the most appropriate AI models based on their complexity or type, ensuring optimal performance for diverse work such as coding refactoring, UI design, and multimodal analysis. Integration with OmO enhances functionality by providing access to advanced code hooks and tools from chat platforms without requiring a terminal. Additionally, OmOC supports task management features like the todo enforcer for task completion and comment checkers for maintaining code quality. OmOC also enables multimodal analysis through Gemini CLI integration, facilitating the analysis of PDFs, images, and videos directly within messaging platforms. The architecture is structured into three layers—planning, orchestration, and execution/verification—with specific agents assigned to each task type. Installation involves using OpenClaw commands, and setup includes configuring agent personas based on user preferences. Various commands like `/omoc`, `/ultrawork`, and `/plan` activate different functionalities for comprehensive task management. Overall, OmOC extends the multi-agent orchestration model from OmO, making it accessible across messaging platforms while retaining robust coding capabilities. Keywords: #phi4, CLI commands, Discord, Oh-My-OpenClaw, OpenClaw, OpenClaw Plugin API Comma-separated List: Oh-My-OpenClaw, OpenClaw Plugin API Extracted Keywords: Oh-My-OpenClaw, OpenClaw Plugin API Final Keywords: Oh-My-OpenClaw, OpenClaw Plugin API Keywords: Oh-My-OpenClaw, OpenCode, Telegram, TypeScript plugin, agent orchestration, agent personas, async workflow, category-based routing, chat-platform, messaging channels, model routing, multi-agent, multimodal analysis, planning execution verification, plugin integration, specialized agents, task delegation, terminal-based AI coding, tmux integration

gemini cli

github.com 13 days ago

3102. HN Hoofy– MCP server with persistent memory, adaptive pipelines, and a Clarity Gate

Hoofy is an innovative server designed specifically to improve Artificial Intelligence (AI) development through persistent memory systems, adaptive workflows, and a Clarity Gate mechanism for structured project specifications. It combines three main components: the Memory System, Change Pipeline, and Project Pipeline into one cohesive platform. The Memory System leverages SQLite with full-text search functionalities to ensure continuity of context across different sessions while managing decisions, bugs, patterns, and discoveries effectively. The Change Pipeline introduces an adaptive workflow that adjusts according to the change type and size, mandating a context-check stage at each flow's outset. The Project Pipeline facilitates greenfield project specifications by progressing from initial ideas to validated architectures through a Clarity Gate that enforces business rules and prevents erroneous assumptions or "hallucinations." Key features of Hoofy include its Knowledge Graph for linking memory observations, automated conflict detection in existing projects, tools for pre-pipeline exploration to capture context accurately, and task assignments based on detailed dependency graphs. Its compatibility with various development environments and the ability to install across platforms make it versatile. The server promotes Spec-Driven Development (SDD) by embedding instructions that prioritize specifications over coding, a practice supported by research indicating improved productivity and error reduction. Hoofy also provides business rule extraction capabilities utilizing BRG taxonomy and DDD Ubiquitous Language principles. Available as a binary with no external dependencies, Hoofy's functionality can be extended through plugins tailored for specific AI tools like Claude Code. By fostering disciplined specification practices, the server aims to minimize scope creep, enhance task clarity, and streamline project management, thus optimizing overall development processes in AI environments. Keywords: #phi4, AI development, Clarity Gate, FTS5, Hoofy, MCP server, SQLite, adaptive pipelines, business rules, change management, context check, decision-driven development, dependency graph, greenfield specification, knowledge graph, memory observations, persistent memory, pipeline exploration, relations, requirements engineering, spec-driven development, structured specifications, structured specifications Keywords: Hoofy, topic keys, ubiquitous language, wave assignments

gemini cli

github.com 13 days ago

3110. HN Show HN: EloPhanto – AI agent that runs locally

EloPhanto is a pioneering open-source AI agent designed to run locally on users' machines, providing comprehensive control over their Chrome browser and system resources. Its distinct capabilities include autonomously building new tools by following a self-sustaining research-design-implement-test-deploy cycle when existing functionalities fall short. It operates independently through an "Autonomous Mind" feature that enables continuous learning from interactions and background task monitoring. Additionally, EloPhanto can manage multiple AI agents simultaneously to handle diverse coding tasks, effectively functioning as a team manager. Unlike conventional tools that rely on headless browsers, EloPhanto leverages the actual Chrome browser with existing sessions for real-time web automation. It possesses its unique identity, complete with accounts like AgentMail and a cryptocurrency wallet on the Base chain, which evolves based on experience accumulation. A strong emphasis is placed on security through encrypted credential vaults, meticulous permission management, PII detection, and robust defenses against prompt injection attacks. The agent supports multi-channel communication via CLI, Telegram, Discord, Slack, and a web dashboard to offer flexible interactions. It excels in browser automation, software development, autonomous operations, self-modification, identity evolution, account management, money-making tasks through its crypto wallet, long-term goal planning, research, and content creation—all while continually refining itself based on user feedback. EloPhanto's architecture integrates a comprehensive system of tools, knowledge databases, permission layers, and multi-channel communication setups to facilitate these functionalities. The project is available under the MIT license, promoting community contributions and providing extensive documentation for installation and usage. Its overarching goal is to serve as an advanced, self-sufficient digital agent that boosts productivity through autonomous growth and learning. Keywords: #phi4, AI agent, EloPhanto, agent swarm, autonomous development, browser automation, crypto wallet, multi-channel, real Chrome, security-first, self-modifying code, toolset building, web dashboard

gemini cli

github.com 13 days ago

3115. HN Show HN: Synergetic-SQR – A 4D rendering engine with bit-exact rotation

The Synergetic-SQR is a cutting-edge proof-of-concept 4D rendering engine designed to address numerical drift issues common in traditional graphics engines by utilizing the principles of Buckminster Fuller's Synergetic Geometry and Andrew Thomson’s Spread-Quadray Rotors (SQR) framework. This innovative approach abandons conventional Cartesian coordinates, opting instead for a tetrahedral system within a 4D space that employs rational surd arithmetic over the $\mathbb{Q}[\sqrt{3}]$ field extension to achieve precise bit-exact rotations. Key innovations of this engine include Algebraic Determinism, ensuring that rotations cycle back exactly to their initial configuration, and Surd-Native Shaders which perform algebraic operations directly on GPUs without relying on transcendental approximations. This not only enhances the computational precision but also ensures topological stability, providing a more stable experience at 60 frames per second compared to traditional matrix systems. The engine features real-time transformations from Vector Equilibrium states into Octahedron formations and has undergone rigorous deterministic benchmarks that demonstrate its high-precision capabilities. The project leverages Metal-cpp for seamless integration with Apple's Metal API, offering an interactive platform to showcase these advancements. Ultimately, the Synergetic-SQR aims to propel forward the realm of deterministic and nature-aligned computer graphics by building on the pioneering work of R. Buckminster Fuller and Andrew Thomson. Keywords: #phi4, 3D renderer, 4D rendering, Algebraic Determinism, Andrew Thomson, Buckminster Fuller, Cartesian basis, Determinism Benchmark, Drift Error, Gemini CLI, Janus Polarity, Linear Jitterbugging, Metal kernel, Metal-cpp, Rational Surd field extension, SIMD registers, SQR Stability Proof, Spread-Quadray Rotors, Surd-Native Shaders, Synergetic Geometry, Synergetic-SQR, Tetrahedral coordinate system, Topological Stability, Vector Equilibrium, bit-exact rotation, numerical stability

gemini cli

github.com 13 days ago

3246. HN Show HN: First native zeroclaw build on Android/Termux (aarch64, no proot)

A native zeroclaw build has been successfully achieved on Android using Termux, marking a significant development as it is the first Rust-based Nostr client and relay tool to be natively compiled for this platform. Previous attempts by Gemini CLI and Gemini Android failed due to issues with make flags and memory constraints in the linker. The breakthrough involved employing the mold linker along with specific cargo configuration settings such as codegen-units=1, lto=thin, and opt-level=z, defined within a .cargo/config.toml file. This approach resulted in a 15.5MB binary that completed its build process in 23 minutes and 55 seconds on a Linux 5.4.284-moto kernel. Comprehensive details, including the final binary, configuration settings, and steps to reproduce this build, are available at the specified GitHub repository. Keywords: #phi4, Android, Gemini CLI, GitHub, Linux, Nostr, OOM-kill, Rust, Termux, Zeroclaw, aarch64, binary, build script, codegen-units, kernel, linker, lto, make, mold, opt-level, proot, reproduction steps, swapon

gemini cli

news.ycombinator.com 14 days ago

3263. HN Show HN: LedgerMind – true zero-touch autonomous memory for AI agents

LedgerMind is an innovative zero-touch, autonomous memory management system crafted specifically for AI agents, designed to function without manual intervention or setup by automatically managing relevant memories prior to prompts and logging all actions and results. It features self-healing capabilities, maintains a Git-based audit trail, resolves memory conflicts autonomously, and supports multi-agent namespacing, making it highly efficient in complex environments. Its key functionalities include zero-touch automation through client-side hooks compatible with Gemini CLI and upcoming support for other platforms like Claude Desktop and Cursor, ensuring seamless integration. LedgerMind's autonomous heartbeat process runs every five minutes to synchronize, reflect, decay, and self-heal the system using SQLite and Git technologies. The system boasts a hybrid storage and reasoning approach that combines a conflict-resolving reasoning engine with structured rules distilled from experiences, enhancing data management efficiency. It also supports multi-agent environments by enabling logical memory partitioning within one framework, underscoring its adaptability. Once configured, LedgerMind operates independently of any developer or agent intervention, currently offering stable performance optimized for high-speed operation on various platforms including Android/Termux. Distributed under the Non-Commercial Source Available License (NCSA), LedgerMind is a pioneering solution in autonomous AI memory management. Keywords: #phi4, 4-bit GGUF Integration, AI agents, Claude Desktop, Conflict Resolution, Cursor, Distillation Engine, Evidence Boost, Gemini CLI, Git, Git-based audit trail, Hybrid Search, Hybrid Storage, LedgerMind, MCP Server, Reflection Engine, SQLite, Zero-Touch Automation, autonomous memory, client-side hooks, knowledge lifecycle manager, multi-agent namespacing, reasoning layer, zero-touch

gemini cli

github.com 14 days ago

3313. HN Show HN: crai – Get notified when your AI CLI finishes thinking

Crai, short for "catcher in the rAI," is a macOS command-line interface (CLI) tool that enhances user interaction with AI systems by providing notifications when they complete tasks following periods of silence. This utility wraps CLI commands within a pseudo-terminal and monitors for at least 1.5 seconds of silence after user input before triggering alerts via system sound, Notification Center banner, or terminal bell. Its features include prompt gating to ensure only one notification per Enter key press, echo suppression by disregarding outputs within 100 milliseconds post-keystrokes, quick-response suppression that avoids notifications for AI responses under five seconds, and typing suppression which ignores notifications while a user is actively composing messages. Crai offers installation through Homebrew or from source code and can be seamlessly integrated into workflows using shell aliases to remain inconspicuous until activated. Users have the flexibility to customize sound files and silence thresholds. Presently limited to macOS due to dependencies on system-specific tools like `afplay` for playing audio and `osascript` for AppleScript, Crai is an open-source project available under the MIT license, fostering community engagement and development. The tool's GitHub repository offers further details and resources. Keywords: #phi4, AI CLI, GitHub, Go file, MIT license, Notification Center, PTY, afplay, alias, command-line tool, crai, echo suppression, macOS, notifications, osascript, quick-response suppression, sound

gemini cli

github.com 14 days ago

ScraperSpider

Scraper
Spider