Scraper
Spider

A robotic spider About
Blog
@dbaman@fosstodon.org
Click ▶ to show/hide AI summary and keywords
Click The google logo for Google search on keywords

2026-03-11 15:25
github copilot
github copilot stories from the last 14 days  | Back to all stories
31.  HN Enable Code-Mode for all your MCP servers even if they don't support it natively
The Remote MCP Adapter serves as a vital intermediary tool, enabling seamless interaction between clients and remote Model Context Protocol (MCP) servers that lack native support for such connectivity. It effectively addresses challenges in traditional setups by facilitating file uploads from clients to tools and capturing generated files back to the client without requiring shared filesystems. Among its key features are multiserver relay capabilities, which expose multiple upstream MCP servers under a single gateway; code mode providing a unified interface for coding agents to discover and execute tools across any server; and comprehensive file handling that stages files for tool access while capturing artifacts like screenshots or PDFs for client retrieval. Additionally, the adapter enhances functionality with session management options, including isolation, time-to-live cleanup, and optional session revival. It supports various state backends such as in-memory storage, SQLite, and Redis, alongside upstream health monitoring through active checks and a circuit breaker to prevent failure cascades. The resilience of the system is bolstered by retry mechanisms for handling dropped upstream sessions. Security is maintained through bearer tokens and signed upload URLs, while observability is assured with OpenTelemetry metrics collection and optional log export features. The adapter also emphasizes safe storage practices, including atomic writes, orphan file cleanup, and quota enforcement. Deployment can be achieved using Docker Compose or Helm charts for Kubernetes environments, necessitating a shared common storage directory between the adapter and upstream servers. Although minimal configuration suffices due to safe defaults, detailed setup guidance is available on its MkDocs site. The latest version introduces features like tool hiding per server and configurable upload consumer tool descriptions, all under the MIT license. Keywords: #phi4, Adapter, Artifacts, Authentication, Backends, Checks, Code-Mode, Compose, Deployment, Docker, Docker Compose Keywords: MCP, File, File Uploads, Health, Health Checks, MCP servers, Observability, Remote, Remote Adapter, Resilience, Servers, Sessions, State, State Backends, Uploads
    The google logo   github.com 3 hours ago
38.  HN Spring CRUD Generator v1.5.0: CI tests, Set relations, Copilot support
Spring CRUD Generator version 1.5.0 brings numerous improvements aimed at enhancing the development experience and maintaining code quality. The release ensures enhanced specification consistency while incorporating Continuous Integration (CI)-backed integration tests that are instrumental in identifying and mitigating code inconsistencies early on. It also places a strong emphasis on usability, evident from the updated documentation provided to users. In terms of backward compatibility, the version deprecates `basepath` in favor of `basePath`, ensuring smoother transitions for developers upgrading their systems. New features include support for generating Set-based relations through `relation.uniqueItems`, addressing previously missing imports needed for JSON collections. The update also boosts productivity with improved GitHub Copilot and autocomplete functionalities that facilitate coding tasks. Moreover, a security policy has been introduced to guide users on how to report security vulnerabilities, thereby enhancing the framework's overall reliability and trustworthiness. Keywords: #phi4, CI, CI tests, CRUD, Copilot, GitHub, GitHub CI, GitHub Copilot, JSON, JSON collections, ManyToMany, ManyToMany relations Keywords: Spring, OneToMany, SECURITYmd, Spring CRUD Generator, autocomplete, backward compatibility, business services, collections, consistency, deprecated, imports, integration, integration test coverage, relation, relation set support, security policy, set, spec, spec consistency, support, test coverage, tests
    The google logo   github.com 3 hours ago
52.  HN Sign in with ANY password into Rocket.Chat EE, found by our open source AI agent
The blog post details the implementation of open-source AI-driven taskflows by GitHub Security Lab to identify significant web security vulnerabilities in projects such as Rocket.Chat EE. These taskflows utilize a Large Language Model (LLM) to streamline vulnerability detection, decreasing reliance on false positives and improving manual verification processes. Notably, over 80 high-impact vulnerabilities have been reported through these methods, with several already publicly disclosed. These taskflows function by dissecting codebases into components, evaluating entry points for untrusted input, and suggesting potential threats based on context-aware threat modeling. Such suggestions undergo rigorous auditing to confirm their legitimacy as security issues. High-impact vulnerabilities identified include authorization bypass in Outline (CVE-2025-64487), sensitive data exposure in e-commerce platforms (CVE-2025-15033, CVE-2026-25758), and password authentication bypass in Rocket.Chat EE (CVE-2026-28514). The process involves segmenting repositories into components, assessing entry points, suggesting vulnerabilities, and auditing these suggestions against strict criteria. The taskflows excel at detecting logical bugs like IDOR and business logic issues rather than technical ones, demonstrating their capacity for understanding code context and threat models. Findings reveal that LLMs are effective in filtering out low-severity false positives and conducting thorough threat modeling across various application types. As an open-source framework, these taskflows can be adopted, adapted, or expanded by the security community to serve purposes beyond mere vulnerability discovery. The authors advocate for active participation from the community in developing new taskflows and enhancing security auditing practices, encouraging contributions and discussions through their repository. Keywords: #phi4, CSRF, CVE identifiers, GitHub Copilot, GitHub Security Lab, IDOR, LLMs, RocketChat, SSRF, XSS, auditing, authentication issues, authorization bypasses, business logic issues, code analysis, command injection, false positives, hallucinations, information disclosure, open source, prompt engineering, remote code execution, seclab-taskflow-agent, security misconfiguration, security research, taskflow design, taskflows, threat modeling, vulnerabilities, web applications
    The google logo   github.blog 4 hours ago
59.  HN Elastic Docs Skills
Elastic Docs Skills offers a catalog of Claude Code automation tools specifically designed to streamline Elastic documentation workflows. Users have the flexibility to browse and install these skills using either a GitHub command or an open CLI tool with a single, simple line of code. To quickly start, users can directly install from GitHub using `curl`, which includes optional flags for listing available skills or installing all at once. Alternatively, the CLI command `npx` facilitates skill installation by specifying the necessary details like group and version. For those interested in contributing to Elastic Docs Skills, it is possible to clone the repository and run locally, with new skills being created via a specific command within the repo or by manually creating a `SKILL.md` file. The skills adhere to Semantic Versioning principles, where major updates indicate breaking changes, minor ones add new features, and patches fix bugs; users can update their installed skills using a dedicated curl command. Continuous Integration (CI) validation ensures that pull requests maintain valid YAML frontmatter and JSON structures, facilitated by GitHub Actions. The repository's structure includes directories for the skills themselves, validation workflows, and an installer script designed with a Text User Interface (TUI). Finally, Elastic Docs Skills is distributed under the Apache License, Version 2.0, with comprehensive contribution guidelines available in the `CONTRIBUTING.md` file. Keywords: #phi4, CLI, Catalog, Contributing, Docs, Elastic Docs, GitHub, License, License Keywords: Elastic, PRs, Repository, SemVer, Skills, Validation, Versioning, YAML
    The google logo   github.com 4 hours ago
93.  HN Agent-debate – AI agents review code by editing a shared Markdown file
Agent-debate is a collaborative code review tool where multiple AI agents—such as Claude, Codex, Gemini, and Copilot—work together by editing a shared Markdown file to conduct structured debates on technical decisions. These agents use evidence from the codebase to support their arguments in an adversarial process that ensures comprehensive analysis of dependencies and assumptions. Each agent is required to provide precise file:line citations for any claims they make and to track disputes within a log, allowing them to either reach consensus or escalate unresolved issues. To prevent scope creep, the tool mandates justification for every proposed addition, with unrelated ideas temporarily set aside in a "parking lot" until deemed relevant. Ultimately, users have the final decision-making authority after agents have converged on recommendations. The system accommodates both manual and automated modes; an orchestrator manages agent interactions through rounds of discussion until consensus is reached or a predetermined number of rounds concludes. Installation requires executing a script from GitHub with customizable options for selecting specific agents. Users can configure default agents and adjust debate parameters to suit their needs. However, the tool has some limitations: it depends on local command-line interface behavior and may incur costs associated with certain providers, particularly for premium features like those offered by Copilot. Agent-debate operates under the MIT license, ensuring open-source flexibility. Keywords: #phi4, AI agents, Agent-debate, Markdown file, Python wrapper, adversarial, code review, configuration, convergence, dependencies, evidence, installation, license, limitations, usage
    The google logo   github.com 7 hours ago
   https://github.com/gumbel-ai/agent-debate/blob   6 hours ago
95.  HN So You Want to Do Agentic Development
By 2026, agentic development has become prevalent, focusing on mature toolsets like VS Code integrated with GitHub Copilot and other free tools such as Mistral Vibe, while advising caution against costly subscriptions. Privacy remains a top priority, with an emphasis on sandboxing to protect personal data from being used within agent tools due to security risks. Contrary to some beliefs about "local AI," cloud-based models continue to offer superior performance. Project initiation involves creating a SPEC.md document that is continuously refined in collaboration with agents, emphasizing the importance of clear specifications over rigid requirements. To support these projects, SKILL.md files provide additional guidelines, and there's an increasing trend of agents developing their own skills. A structured workflow includes the creation of PLAN.md for dynamic project management throughout development. Effectively directing agent activities is key, employing strategies such as TDD-like testing and static analysis to guide and refine code generation. Languages with strong typing like Go and TypeScript are favored due to their self-correcting features. Future advancements aim to boost agents' autonomy and facilitate collaboration among them, alongside improvements in sandboxing practices to enhance security. Keywords: #phi4, Agentic Development, GitHub Copilot, Language Matters, PLANmd, Privacy, SKILLmd, SPECmd, Sandbox, Security, Steering, Tooling, VS Code, Workflow
    The google logo   taoofmac.com 7 hours ago
124.  HN NovAI Coder – Free Copilot Alternative Using Chinese AI Models
NovAI Coder is presented as a cost-effective, open-source alternative to GitHub Copilot, offering powerful Chinese AI models like DeepSeek V3.2, Qwen, and GLM-4 at approximately 10% of competitors' prices. It features an easy setup on Windows requiring no configuration and provides $0.50 in free credits upon registration. Users benefit from access to seven AI models, real-time credit balance tracking, ultra-low latency through its Hong Kong-based API server, and compatibility with the OpenAI API for seamless integration into custom tools. The platform emphasizes privacy by foregoing KYC processes and accepts PayPal or USDT as payment methods. Built using Electron and the OpenClaw coding agent, NovAI Coder aims to expand support to macOS and Linux in addition to a planned VS Code extension. With its MIT license, it encourages free use and modification, positioning itself as an affordable AI coding assistant for developers who prefer minimal financial investment. Keywords: #phi4, AI Assistant, AI Coding Assistant, AI Models, API Gateway, Coding Benchmarks, DeepSeek V3, Developer Tools, Developer Tools Keywords: NovAI Coder, Electron, Free Credits, GLM, GitHub Alternative, GitHub Copilot Alternative, Hong Kong Servers, Linux Support, MIT License, NovAI Coder, Open Source, OpenClaw, OpenClaw Agent, PayPal, Privacy-First, Qwen, USDT, Ultra-Low Latency, VS Code Extension, macOS Support
    The google logo   github.com 11 hours ago
229.  HN I built a programming language using Claude Code
Over four weeks, an author developed a programming language named Cutlet using Claude Code, demonstrating agentic engineering by enabling Claude to autonomously generate all code without human intervention. The project tested the capabilities of large language models (LLMs) like Claude, revealing their potential in software development while also highlighting certain limitations, such as missing features including file I/O and error handling. Designed for macOS and Linux, Cutlet incorporates basic functionalities like arrays, strings, and functions. The author’s objective was to minimize human oversight while testing Claude's abilities, emphasizing the need for problem definitions that leverage LLM strengths, clear communication, and supportive environments with efficient iterative processes. Tools developed alongside Cutlet, such as comprehensive testing suites and memory safety checks, facilitated Claude’s autonomous improvement of the language, showcasing both successes and challenges inherent in AI-driven projects. While the project yielded successful outcomes, it prompted reflection on the author's role when using AI tools, raising questions about the evolving nature of software engineering with LLMs. The addictive potential of such tools was acknowledged as a concern for mental health. Cutlet offers rapid experimentation opportunities and reduces reliance on external libraries but leaves broader societal impacts largely unaddressed. Development on Cutlet is set to pause while the author pursues new work opportunities, though minor updates may continue. This experiment highlights both the transformative possibilities and challenges posed by generative AI in programming, suggesting a significant shift in how software development might evolve with increasing LLM integration. Keywords: #phi4, Claude Code, Cutlet, Docker, GitHub Copilot, LLM-assisted programming, REPL, agentic engineering, arrays, dynamic language, functions, memory safety tools, memory safety tools Keywords: Cutlet, meta-operator, programming language, software engineering, strings, test suite
    The google logo   ankursethi.com a day ago
   https://en.wikipedia.org/wiki/Hang_the_DJ   a day ago
   https://www.youtube.com/watch?v=Mcr7G1Cuzwk   a day ago
   https://balsa.info   a day ago
   https://news.ycombinator.com/newsguidelines.html   a day ago
   https://code.claude.com/docs/en/model-config#exten   21 hours ago
   https://www.google.com/search?q=ab+initio+dml+language   21 hours ago
   https://github.com/t3rmin4t0r/magic-partitioning   21 hours ago
   https://www.copyright.gov/rulings-filings/review-board&   21 hours ago
   https://newsroom.loc.gov/news/copyright-office-releases   21 hours ago
   https://www.anthropic.com   21 hours ago
259.  HN Emacs and Vim in the Age of AI
The article examines how artificial intelligence (AI) could influence Emacs and Vim, two established text editors with strong user communities, in a landscape where modern IDEs like VS Code are rapidly incorporating AI features. While acknowledging the potential threat posed by these dominant platforms, it highlights unique opportunities for Emacs and Vim to leverage AI technologies despite facing significant challenges. The risks outlined include the growing appeal of AI-integrated IDEs such as VS Code, which may divert users from traditional editors due to their seamless AI integration. Additionally, with AI increasingly handling coding tasks, the inherent advantages of Emacs and Vim in manual editing might diminish. The backing of tools like VS Code by major companies and venture capital creates a competitive environment that is challenging for community-driven projects such as Emacs. Despite these challenges, opportunities exist for AI to lower barriers to customization through simplifying code translation into languages like Elisp or Lua, potentially attracting more contributors and engaging the community further. There are already strong AI integrations within Emacs and Neovim which can be expanded, with Emacs's multifunctional nature offering particular advantages for cross-domain AI applications beyond coding itself. Moreover, AI could assist users in troubleshooting complex configuration issues, drawing back those who previously left due to such difficulties. The article also touches on ethical considerations surrounding AI usage, including environmental impact and job displacement concerns, emphasizing the importance of these discussions within the community. Ultimately, it argues that the future of Emacs and Vim hinges not merely on incorporating advanced AI features but on their communities' ability to adapt and innovate continuously. Engagement and proactivity among users are crucial in ensuring these editors remain relevant despite changes in the technological landscape. Keywords: #phi4, AI, Copilot, Elisp, Emacs, IDEs, Neovim, VS Code, Vim, VimScript, automation, community, configuration, ethical concerns, extension languages, integration, keybindings, learning curve, open-source, plugins, productivity, programming
    The google logo   batsov.com a day ago
271.  HN You Bought the AI Licenses. Why Is Only One Developer Getting 10x Results?
The article highlights a prevalent issue within organizations that have invested significantly in AI tools but experience varying levels of success due to disparities in configuration optimization among developers. The root cause is identified as the undocumented and non-distributed context—such as custom rules and agent skills—that high-performing developers utilize, which prevents others from achieving similar results despite access to advanced tools like Cursor, Claude, and Copilot. Prominent companies including Google and Atlassian struggle with effective AI knowledge sharing due to inadequate centralized infrastructure for configuration distribution. Current solutions, such as using Git for versioning or relying on vendor-specific marketplaces, fall short in terms of scale, leading to fragmented knowledge without proper organizational governance and scalability. These challenges impede consistent implementation across different tools and repositories. To combat these issues, Skills.new has been developed as a platform that captures AI knowledge once, categorizes it with built-in governance, and distributes it universally within an organization. This ensures configurations remain current, secure, and accessible, thereby enabling developers and autonomous agents to work effectively using the appropriate context. Ultimately, while AI tools themselves are becoming commoditized, the true competitive edge lies in a structured knowledge layer that enhances their effectiveness. Skills.new addresses this by providing a centralized system for managing and distributing AI skills across engineering teams, thus facilitating improved collaboration and performance within organizations. Keywords: #phi4, AI Agents, AI Licenses, AI Tools, Configuration Gap, Contextual Knowledge, Developer Productivity, Engineering Organizations, Governance, Marketplaces, Skill Sharing, Skillsnew, Token Management
    The google logo   skills.new a day ago
289.  HN Hooking Coding Agents with the Cedar Policy Language
The article addresses strategies for mitigating security risks posed by autonomous coding agents within enterprise settings, particularly those interacting with sensitive data and executing actions autonomously. The increasing vulnerabilities demand structured solutions to effectively understand and mitigate these issues. A proposed method involves using the Cedar Policy Language, which enables deterministic control over agent behaviors through runtime hooks that monitor trajectory events—comprising agent actions and system responses—to enforce security boundaries via a Reference Monitor characterized by always being invoked, tamper-proof, and verifiable. The framework maps various risks like data exfiltration or remote code execution onto this event model for comprehensive threat modeling. Cedar's expressiveness and support for permission models make it suitable for enforcing policies that are both deterministic and auditable, contrasting with the opaque decision-making processes of large language models (LLMs). Policies can be articulated in multiple forms, translating security guidelines into executable code to balance safety and functionality within coding agent operations. The architecture also incorporates Hook Adapters and a Harness Service, which process and authorize events using Cedar policies. Looking forward, enhancements are planned for the policy engines to improve scalability and manage stateful policies across interactions while maintaining a balance between security measures and the utility of coding agents. This approach marks a shift from solely relying on LLM alignment towards establishing robust, adaptable security frameworks that evolve with the capabilities and autonomy of coding agents. Keywords: #phi4, Attribute-Based Access Control, Cedar Policy Language, Coding agents, OWASP Top 10, Reference Monitor, deterministic controls, hooks, information flow control, lethal trifecta, policy enforcement, security boundaries, trajectory event model
    The google logo   blog.sondera.ai a day ago
   https://github.com/sondera-ai/sondera-coding-agent-hook   a day ago
298.  HN Show HN: Crit – Review AI agent work like you review PRs
Crit is a command-line tool aimed at enhancing the efficiency and effectiveness of reviewing AI-generated content, such as plans and code. It addresses the cumbersome manual review process by offering a browser-based interface that supports GitHub-style inline comments for easy feedback and iteration. Key features include structured feedback that formats comments into prompts ready to be pasted back to AI agents, diff viewing for highlighting changes between document iterations, and support for both specific file reviews and git diffs in repositories. Crit integrates seamlessly with popular AI coding tools like Claude Code, Cursor, and GitHub Copilot through drop-in configurations. Installation is straightforward across various platforms using methods such as Homebrew on macOS/Linux, Go or Nix commands, or by downloading a standalone binary without additional dependencies. The tool supports usage scenarios including reviewing specific files directly, automatic detection of changed files in git repositories for review, and concurrent reviews by running instances on different ports. Additional features facilitate user experience with options like asynchronous sharing of reviews, Vim keybindings for navigation, theme selection, and auto-save functionality. Crit’s integration capabilities automate the review loop with major AI coding tools, simplifying workflows involving AI-generated content. Built using Go 1.26+, it includes a comprehensive end-to-end test suite utilizing Playwright to ensure robust performance across platforms and scenarios, ultimately making the review process of AI-generated documents more user-friendly and efficient. Keywords: #phi4, AI agent, CLI, Crit, Docker, Git, GitHub-style, Mermaid diagrams, PRs, Playwright tests, Vim keybindings, browser-based UI, code review, diff, environment variables, inline comments, markdown, real-time output, syntax highlighting
    The google logo   github.com a day ago
305.  HN Show HN: Sandboxing Agents on macOS and Linux with Nix
The document introduces "agent-sandbox.nix," a declarative sandboxing tool designed for AI agents operating on macOS and Linux, which focuses on enhancing security by limiting file and network operations within the agent's execution environment. It employs `bubblewrap` on Linux to isolate processes from their host machines through namespace unsharing, while macOS utilizes `sandbox-exec` to implement a strict "deny-default" policy that restricts default permissions. Key features include the ability to control read/write access to specific directories and files, such as the current working directory and declared state directories/files. The sandbox offers unrestricted network access for API interactions but enforces restrictions on file system operations by allowing binaries from specified packages (`allowedPackages`) and environment variables (`extraEnv`), while eliminating any existing host environment configurations. Users can set up a development shell for AI tools like Claude through examples provided in `flake.nix` and `shell.nix`, requiring the configuration `NIXPKGS_ALLOW_UNFREE=1` due to restrictions on non-free software. Authentication within this secure environment relies on runtime-evaluated tokens stored in environment variables, ensuring they are not permanently embedded in the Nix store. The document provides guidance for configuring state directories essential for tool dependencies and offers a method for debugging via a bash wrapper that mirrors sandbox configurations, facilitating interactive exploration of the environment. Despite its robust security framework, limitations include blocking Git push operations due to `$HOME` masking and prohibiting SSH key access unless explicitly permitted through environment variables. Keywords: #phi4, /nix/store, AI agents, CLI-based, Git pushes, Linux, Nix, NixOS, Sandboxing, allowedPackages, authentication, bubblewrap, configuration files, debugging, declarative, deny-default, environment variables, ephemeral, extraEnv, flake, isolation, macOS, network access, packages, permissions, runtime evaluation, sandbox-exec, secrets management, security policy, shellnix, stateDirs, stateFiles, tmpfs, token-based auth
    The google logo   github.com a day ago
318.  HN Levels of Agentic Engineering
The article presents an eight-level framework called "Agentic Engineering," designed to integrate artificial intelligence (AI) into software engineering workflows effectively. As AI models advance, the challenge lies in bridging the gap between their potential capabilities and practical application within product development. **Levels 1-3** focus on basic code completion through tools like GitHub Copilot, progressing to context-sensitive coding via IDEs that merge chat functionality with codebases, enhancing developers' efficiency and contextual understanding. **Level 4** emphasizes "context engineering," which involves refining system prompts and managing conversation histories to increase the information density of AI interactions, crucial for improved performance. In **Level 5**, termed "compounding engineering," learned enhancements are systematically codified for future use, employing tools like Multi-Context Processing (MCPs) and custom skills that deepen LLMs' interaction with development environments, databases, and APIs. As the framework advances to **Levels 6-7**, it introduces "harness engineering," which creates supportive environments where AI agents operate autonomously through feedback mechanisms and security boundaries, minimizing human oversight. This includes orchestrating background tasks via dispatch systems such as Dispatch or Inspect, utilizing various models to capitalize on their unique strengths. **Level 8** envisions direct multi-agent coordination without central orchestration, allowing AI agents to collaborate directly on complex projects like developing compilers or migrating large codebases. However, this level is largely theoretical due to challenges in managing risks and resources efficiently. The article suggests that most software engineering tasks currently benefit from the autonomy and coordinated efforts described at Level 7. It also proposes a future step of transitioning from text-based interactions with AI systems to more intuitive voice-to-voice interfaces for developers. Overall, the emphasis remains on iterative improvements rather than pursuing perfect one-shot solutions in AI-assisted coding. Keywords: #phi4, AI-assisted coding, Agentic Engineering, Claude Code, MCPs (Micro-Component Platforms), Micro-Component Platforms, SWE-bench, background agents, compounding engineering, context engineering, dispatching work, multi-agent coordination, multi-agent coordination Keywords: Agentic Engineering, orchestrator LLM, productivity metrics, skills
    The google logo   www.bassimeledath.com a day ago
   https://factory.strongdm.ai/techniques   21 hours ago
   https://factory.strongdm.ai/products/attractor#communit   21 hours ago
   https://github.com/search?q=strongdm+attractor&type=repo   21 hours ago
   https://github.com/strongdm/attractor/forks   21 hours ago
   https://sibylline.dev/articles/2026-01-27-stop-orchestr   21 hours ago
   https://github.com/berserkdisruptors/contextual-commits   19 hours ago
340.  HN Emacs and Vim in the Age of AI
The article delves into the transformative influence of artificial intelligence (AI) on classic text editors Emacs and Vim, highlighting both potential risks and opportunities for these tools in an era increasingly dominated by AI-enhanced programming environments. The author draws from extensive personal experience with Emacs and recent exposure to Vim to contextualize the shifts brought about by AI integration. One primary risk is the dominance of Integrated Development Environments (IDEs) like VS Code, which are incorporating advanced AI features, potentially drawing users away from Emacs or Vim due to their enhanced capabilities. This shift challenges the traditional appeal of these editors, particularly as mechanical editing speed becomes less critical in favor of skills related to specifying intent and evaluating outputs—skills not inherently supported by Emacs or Vim. Furthermore, well-funded projects have significant advantages over volunteer-driven communities like those supporting Emacs and Vim, creating a disparity in resource availability for AI integration. A speculative concern is the potential for programming tasks to become fully automated, threatening the relevance of coding editors altogether. However, opportunities also emerge from this technological evolution. AI could simplify the process of configuring and extending Emacs and Vim by translating plain language requests into executable code, thus lowering barriers to customization. Additionally, AI tools might facilitate community growth by easing entry points for new contributors and assisting maintainers with tasks such as documentation. Both editors already have foundational AI integrations that can be expanded, leveraging their inherent extensibility to integrate AI more seamlessly within user workflows. Emacs, in particular, is noted for its versatility beyond programming, functioning effectively across various non-coding tasks, which could provide resilience even if traditional coding roles diminish. The article also addresses ethical considerations such as the environmental impact of AI model energy consumption and copyright issues related to training data—concerns that are particularly pertinent within open-source communities. Ultimately, while AI poses significant challenges for Emacs and Vim, there are substantial opportunities for adaptation and innovation. The continued relevance and survival of these editors will depend not only on technological advancements but also on active community engagement and the resolution of ethical issues. Keywords: #phi4, AI, Copilot, Elisp, Emacs, IDEs, Neovim, VS Code, Vim, VimScript, automation, community, configuration, ethical concerns, extension languages, integration, keybindings, learning curve, open-source, plugins, productivity, programming
    The google logo   batsov.com a day ago
359.  HN TLAi+ Benchmarks for Evaluating LLMs
The TLaI+Bench is a comprehensive dataset and benchmark suite developed to evaluate Large Language Models (LLMs) on tasks related to TLA+ formal specifications, addressing both logic puzzles and real-world scenarios. Created to fulfill the need for standardized benchmarks within the TLA+ community, it arose from initiatives like the TLA+ Dataset Issue and the TLaI+ Challenge by the TLA+ Foundation. The primary purpose of TLaI+Bench is to provide consistent evaluation metrics for LLMs on formal specification tasks while also serving as a reference for developing AI-assisted tools in TLA+ development. Additionally, it supports research in formal methods and AI, offering educational resources through practical problems. The repository structure includes puzzle descriptions that require formal specifications, such as the River Crossing and Game of Life puzzles, along with gold standard TLA+ specifications to serve as references. It also features GenAIScript utilities designed for AI-assisted specification generation from natural language inputs to TLA+. The benchmark encompasses a range of puzzle categories, including Logic Puzzles, Concurrency, Algorithms, Games & Strategy, Mathematical Structures, and Simulation. To utilize the benchmarks, certain prerequisites are necessary: VSCode with the TLA+ extension, an X11 server for headless environments, Node.js 24+, and specific tools like tla2tools.jar. The GenAIScript is employed to automate the generation and verification of specifications using various LLM providers. Running these benchmarks involves reading puzzle descriptions, generating specifications, performing syntax checks, model verification, and comparing outputs with gold standards. This process includes TLC counterexample analysis, refinement checking, behavioral equivalence, and property satisfaction. The project encourages community engagement through contributions like new puzzles, evaluation tools, documentation enhancements, and validation efforts. It recognizes the TLA+ Foundation's mission, celebrates challenge winners, and appreciates the broader TLA+ community's contributions. As an open-source initiative under the MIT License, TLaI+Bench fosters collaboration and innovation in AI-assisted formal methods development. Keywords: #phi4, AI-assisted development, GenAIScript, GitHub Copilot, Large Language Models, TLA+, TLAi+ Challenge, behavioral equivalence, benchmarks, counterexample analysis, evaluation criteria, formal specification, logic puzzles, model checking, property satisfaction, property satisfaction Keywords: TLA+, real-world scenarios, refinement, verification
    The google logo   github.com a day ago
391.  HN Agentic Harness Bootstrap
The "Agentic Harness Bootstrap" is a sophisticated tool crafted for facilitating AI-driven code generation, offering an automated method to create essential project artifacts. It seamlessly integrates with popular AI coding platforms such as Claude Code, OpenAI Codex, and GitHub Copilot, enabling users to generate agent instruction files, architecture maps, CI pipelines, lint configurations, and pre-commit hooks through a simple command after cloning its repository. Operating in four phases—discover, analyze, generate, and verify—the tool produces customized outputs without altering existing user customizations. Key functionalities include the creation of CLAUDE.md, AGENTS.md, ARCHITECTURE.md for instructions; task runner scripts; pre-commit hooks; lint configurations; verification scripts; ADR directories; and CI integration pipelines. Its adaptability allows it to tailor its output depending on whether a project is new (greenfield) or existing (brownfield), and its idempotent nature ensures safety in repeated use without affecting current customizations. The tool adheres to specific engineering principles, including deterministic verification for automated checks of agent outputs; semantic linting that offers fix instructions within linter messages; three-tier boundaries defining action categories for harness behavior; fail-fast feedback by prioritizing swift initial checks like linting and type checking; and utilizing architecture as a navigational map without delving into underlying reasons. The repository structure incorporates instruction files, maps, CI configurations, and examples for various stacks such as Go microservices, PHP/Laravel applications, and React single-page applications (SPAs). It exemplifies its principles through the validation of templates and maintenance of example integrity via CI pipelines, ultimately creating a controlled environment for AI agents to generate code reliably at scale. Keywords: #phi4, AI Coding Tools, Agentic Harness, Agents, Architecture, Bootstrap, CI Pipelines, Deterministic Verification, Idempotency, Lint Configs, Pre-commit Hooks, Repo Structure, Semantic Linting
    The google logo   github.com a day ago
418.  HN Ask HN: How does one review code when most of the code is written by AI?
The discussion highlights the challenges encountered in reviewing AI-generated code, particularly when using multiple cloud agents. Despite possessing demo artifacts and automation test suites, these tools are inadequate for comprehensive scenario verification because they do not keep pace with ongoing development changes. Additionally, utilizing GitHub Copilot for pull request reviews presents issues due to an excess of minor criticisms and false positives, complicating the identification of real problems. Contributors express a need for effective strategies to handle the heightened workload and complexity associated with code review in this context. The conversation underscores the necessity of finding better solutions to streamline and enhance the effectiveness of AI-assisted code review processes. Keywords: #phi4, AI code, Code review, GitHub Copilot, PRs, automation test suites, cloud agents, demo artifacts, development, false positives, nitpicks, surge, true positives
    The google logo   news.ycombinator.com 2 days ago
482.  HN GitHub Security Lab's open source AI-powered vulnerability scanner
The GitHub Security Lab has introduced an open-source AI-powered vulnerability scanner that utilizes Taskflow Agents and auditing taskflows to detect web security vulnerabilities, especially in open source projects. These taskflows prioritize high-impact issues like authorization bypasses and information disclosure by verifying results manually, rather than exploring numerous non-exploitable possibilities. This allows researchers to focus on validating severe findings which can lead to unauthorized data access or privilege escalation. The scanner has reported over 80 vulnerabilities, including those in ecommerce applications and the Rocket.Chat platform, with these discoveries being openly shared for community contributions. Taskflows, configured in YAML, guide AI models through a sequence of tasks to systematically assess code components, thereby reducing false positives and mitigating inaccuracies by using structured prompts and contextual data from threat modeling. The tool highlights the necessity of understanding a project's functionality and security boundaries to accurately identify vulnerabilities, offering guidelines for pinpointing application entry points, evaluating risks, and auditing potential issues with stringent criteria. The system is capable of being run on private repositories and can be applied to users’ own projects. GitHub Security Lab encourages community engagement by using these taskflows on their projects and contributing new ones, promoting collaborative efforts towards enhanced security practices. This initiative illustrates the significant role AI can play in improving code audits and vulnerability management within software development. Keywords: #phi4, AI-powered scanner, CSRF, CVE identifiers, GitHub Security Lab, IDOR issues, LLMs (Large Language Models), SQL injection, SSRF, XSS, XXE, auditing taskflows, authentication issue, authorization bypasses, business logic issue, command injection, file upload handling, information disclosure, insecure deserialization, memory safety, open redirect, remote code execution, security misconfiguration, template injection, threat modeling, vulnerability scanner, web security vulnerabilities
    The google logo   github.blog 2 days ago
501.  HN Emacs and Vim in the Age of AI
The article examines the potential impact of artificial intelligence (AI) on traditional text editors such as Emacs and Vim, which have long been favored by developers. It addresses both risks and opportunities associated with integrating AI into these platforms. A significant risk is the dominance of Integrated Development Environments (IDEs) like VS Code, which benefit from seamless AI integration through tools like GitHub Copilot, potentially drawing users away from Emacs and Vim due to their complex customization requirements. Additionally, as AI automates more coding tasks, the emphasis shifts towards developers' ability to articulate their intent and evaluate AI-generated code, reducing the necessity for rapid manual editing skills. The resource disparity is also highlighted; whereas VS Code enjoys corporate support, Emacs and Vim rely on smaller community-driven efforts. However, opportunities exist for these traditional editors in simplifying customization through AI, which can translate natural language commands into scripts within their frameworks. Furthermore, AI tools could assist in plugin development by aiding contributors with tasks like test scaffolding or documentation generation. The existing integration of AI technologies within Emacs and Neovim suggests a promising potential for enhancing these text editors' workflows. The article also considers the broader implications of this shift. Text editors are transitioning from primary coding environments to platforms where developers primarily refine AI-generated code, emphasizing their role in workflow management rather than direct input generation. This evolution presents ethical concerns such as the environmental impact of large language models and copyright issues related to training data, alongside fears of job displacement due to increased productivity from AI tools. Some community members have even created forks of existing editors to avoid AI integration. In conclusion, while challenges posed by AI are substantial, the enduring adaptability of Emacs and Vim—alongside their dedicated communities—positions them for potential survival in an AI-driven future. Their continued relevance hinges on effectively integrating new technologies without compromising the core values that initially attracted users. Active engagement with emerging tools and community participation will be crucial to their success amidst these technological advancements. Keywords: #phi4, AI, Copilot, Emacs, IDEs, Neovim, VS Code, Vim, adaptation, automation, community, configuration, efficiency, ethical concerns, integration, keybindings, learning curve, open-source, plugins, programming
    The google logo   batsov.com 2 days ago
532.  HN Show HN: AMP – Open protocol for AI conversation portability
AMP (AI Memory Protocol) is an open protocol developed to standardize AI conversation data across various platforms like ChatGPT, Claude, Gemini, and others, which currently use distinct formats for exporting conversation histories, thereby hindering interoperability and integration. AMP introduces a unified schema comprising `AMPMessage` and `AMPConversation` structures that encapsulate essential details such as message IDs, roles, content, platform identifiers, timestamps, etc., to facilitate easy conversion and migration of data between systems. Key features of AMP include auto-detection capabilities for identifying source platforms and converting their exports into a standardized format. It provides export methods that allow the transformation of various formats like nested DAGs, JSON, SQLite databases, BSON timestamps, among others, into its structured schema. Additionally, AMP offers a library (`@purmemo.ai/converters`) to enable developers to perform these conversions programmatically using JavaScript. The protocol is implemented as an open-source project under the Apache-2.0 license, inviting contributions from the developer community. It currently includes converters for several platforms with plans to extend support to others such as Poe and Amazon Q. To engage users and developers, AMP fosters a community through its Discord channel, facilitating discussions on development and contributions. For quick adoption, AMP provides a CLI tool (`npx @purmemo.ai/migrate`) that enables users to convert existing conversation exports into the AMP format efficiently, supporting various input formats and offering a human-readable markdown output. Overall, AMP aims to enhance AI conversation data portability, allowing for more seamless integration and management of AI interactions across multiple platforms. Keywords: #phi4, AI, AMP, BSON, CLI, DAG, JSON, SQLite, conversation portability, converters, export, open-source, protocol, schema
    The google logo   github.com 2 days ago
   https://purmemo.ai   2 days ago
540.  HN Show HN: Overture – A visual plan interceptor for AI coding agents
Overture is a visual tool designed to improve transparency and control when using AI coding agents such as Cursor, Claude Code, Cline, Copilot, and Sixth AI. It addresses the issue of these agents beginning to write code immediately upon receiving a user prompt without providing an initial execution plan, which often leads to inefficiencies due to misunderstandings that necessitate discarding generated plans. To resolve this, Overture intercepts the planning phase of AI agents and presents it as an interactive flowchart before any coding begins. This allows users to view, modify, or approve the plan, ensuring alignment with their objectives. The visualization includes detailed node information such as complexity levels, required inputs, risks, and context attachments. Overture features an Interactive Plan Canvas for real-time visualization and manipulation, a Node Details Panel for in-depth analysis of each step, and Dynamic Fields that accept various user inputs. Additionally, it provides Branch Detection & Selection to choose among multiple approaches, a Requirements Checklist to confirm all necessary conditions before execution, and Execution Controls enabling users to pause, resume, or re-run tasks as needed. The tool operates as a Multi-Coding Protocol (MCP) server, making it compatible with different AI agents and can be installed globally via npm. Users have the flexibility to configure Overture for specific agents through settings files and customize its behavior using environment variables. Keyboard shortcuts are available for quick interactions such as plan approval or execution control. Overture is open-source under the MIT License, inviting community contributions and improvements, with technologies like Node.js, React, and Dagre used in its development. By providing a visual plan before code execution, it enhances transparency, allows user control over AI decisions, supports multi-project management, and ensures efficient resource use by preventing unwanted code generation. As part of Sixth's suite, Overture offers an integrated experience within VS Code that requires no configuration. Keywords: #phi4, AI coding agents, MCP server, Overture, choice, context, contributing, control, development, efficiency, extensible, history, interactive flowchart, interceptor, interpretability, license, multi-project, offline, open source, planning phase, real-time execution, safety, tech stack, transparency, trust, visibility, visual plan
    The google logo   github.com 2 days ago
569.  HN Show HN: Think Better – Inject Decision Frameworks into Claude and Copilot
"Think Better" is an advanced AI tool designed to enhance decision-making and problem-solving by integrating structured frameworks into popular AI assistants such as Claude, GitHub Copilot, and Antigravity. The tool transforms ambiguous issues into clear action plans using 10 decision frameworks, 12 cognitive bias warnings, and 10 decomposition methods. Its functionalities enable users to classify problems, recommend appropriate frameworks, generate comparison matrices, and document decisions for future reflection. Users can leverage Think Better to address various challenges like choosing between job offers or resolving technical issues, with guidance tailored based on recognized biases or applicable frameworks. The tool is accessible via binary download (recommended), Go install, or building from source—requiring Python 3 for certain scripts. It includes two primary skills: "/make-decision," which aids in decision-making through comparison matrices and cognitive bias warnings; and "/problem-solving-pro," a general problem-solving skill utilizing a 7-step methodology. Additionally, Think Better offers command options to manage AI skills, with a requirement of Go 1.25+ for building from source. As an open-source project under the MIT License, it invites community contributions and provides installation guides available in English and Vietnamese. Keywords: #phi4, AI, Binary Choice, Cognitive Biases, Communication Patterns, Contributing, Decision Frameworks, Decision-Making, Decomposition Methods, Go Files, Issue Tree, Knowledge Records, Mental Models, Open Source, Problem-Solving, Python Scripts, Team Dynamics, Trigger Phrases
    The google logo   github.com 2 days ago
573.  HN Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer
Agent Kanban is an extension for Visual Studio Code (VS Code) designed to enhance task management specifically for developers using AI coding agents like GitHub Copilot. Addressing challenges such as context rot and lack of persistent task history, it integrates a kanban board within VS Code, allowing structured planning without requiring its own agent harnesses. The main features include GitOps & Kanban Board Integration, which promotes team collaboration through an integrated kanban board; Structured Workflow via Commands using @kanban commands to manage tasks; Markdown as Source of Truth, employing version-controlled Markdown files for task records and decision logs; and a GitOps Friendly Design that ensures all task history is committed to Git for transparency. The workflow involves documenting tasks in Markdown files with YAML frontmatter and seamlessly integrating with GitHub Copilot by adding a @kanban chat participant. Developers guide the agent through tasks using simple verbs like plan, todo, and implement, while the kanban board provides an overview of task progress. Agent Kanban maintains simplicity, supports collaborative environments with Git-tracked workflows, and ensures that decisions and plans are preserved for team visibility. It offers a lightweight yet effective solution for streamlining AI-assisted workflows with context and version control, available on the VS Code Marketplace with its source code hosted on GitHub. Keywords: #phi4, AI-Assisted Developer, Agent Kanban, Context Rot, Extension, GitHub Copilot, GitOps, IDE Integration, Kanban Board, Markdown, Plan/Todo/Implement, Task Management, VS Code, Workflow
    The google logo   www.appsoftware.com 2 days ago
   https://github.com/openai/symphony   2 days ago
   https://github.com/LachyFS/kanban-markdown-vscode-exten   2 days ago
   https://www.appsoftware.com/blog/introducing-vs-code-ag   2 days ago
   https://www.youtube.com/watch?v=Y4a3FnFftKw   2 days ago
   https://github.com/appsoftwareltd/vscode-agent-kanban   2 days ago
   https://boristane.com/blog/how-i-use-claude-code/   2 days ago
   https://github.com/TechDufus/openkanban   a day ago
   https://kanboard.org/   a day ago
   https://github.com/rcarmo/piclaw   a day ago
576.  HN Custom Agents in Visual Studio
Visual Studio enhances its assistant capabilities with custom agents designed specifically for debugging, profiling, testing, and modernizing code, integrating deeply with its native tools to offer advanced features like systematic error diagnosis, performance optimization suggestions, tailored unit test generation, and framework upgrades supported by migration assistance. Beyond these preset options, developers can create personalized agents using a foundation that includes workspace awareness, code understanding, and the ability to connect external knowledge sources through the Model Connectors Platform (MCP). This customization enables workflows such as automated code reviews aligned with style guides or enforcing design systems linked to Figma files. Custom agent configurations are established in `.agent.md` files within the `.github/agents/` directory of a repository. Although this feature is currently in preview and may change, it fosters community engagement by inviting developers to share their setups through the awesome-copilot repo. This platform encourages collaboration on refining custom agent setups tailored for Visual Studio’s environment. Developers interested in contributing configurations or providing feedback are encouraged to use the awesome-copilot repository or official channels. Keywords: #phi4, Code Review, Code Understanding, Custom Agents, Debugger, Design System Enforcement, External Knowledge Sources, Feedback, GitHub Copilot, MCP, Model Picker, Modernize, Planning, Preset Agents, Profiler, Test, Tool Names, Visual Studio, Workspace Awareness
    The google logo   devblogs.microsoft.com 2 days ago
632.  HN Show HN: ChatML - Run Claude Code Parallel Sessions in a Desktop app
ChatML is a macOS desktop application designed to enhance developers' productivity by enabling the concurrent execution of multiple AI coding agents through Claude Code. This app addresses the constraint of managing singular coding sessions at any given time by leveraging git worktrees, which allows tasks like refactoring code, adding API endpoints, fixing bugs, or writing tests to run independently and prevent merge conflicts. Users can register any Git repository to set up isolated workspaces with dedicated branches and directories for each task. Key features of ChatML include the ability to maintain autonomous AI agents in separate sessions capable of performing file operations and executing commands autonomously. It integrates a built-in code review system and facilitates GitHub pull request creation directly from the application. Additionally, it offers access to a marketplace of specialized prompt templates that enhance functionality. Developers have control over their budget with real-time monitoring of token usage, providing efficient resource management. Open-source under GPL-3.0, ChatML encourages community contributions, particularly for extending compatibility to Windows and Linux platforms. The app employs a polyglot architecture consisting of Tauri 2 (Rust) for the desktop shell, Next.js and React for the frontend interface, Go and SQLite for backend management, alongside Node.js with Claude Agent SDK for AI functionalities. Security is emphasized through the encryption of API keys and isolated session operations without telemetry, ensuring user data protection. ChatML is freely available for use, modification, and distribution under its open-source license, positioning it as a versatile tool for developers looking to optimize their coding workflow through parallelized AI-driven tasks. Keywords: #phi4, AI coding agents, API key, Agent SDK, ChatML, Claude Code, GNU General Public License, GitHub, Go Backend, Linux, Nextjs, Nodejs, Tauri, UI/UX, Windows, cross-platform support, desktop app, documentation, git worktrees, isolated worktree, macOS, parallel sessions, security, testing
    The google logo   github.com 2 days ago
   https://code.claude.com/docs/en/common-workflows   2 days ago
654.  HN So You Want to Do Agentic Development
As of 2026, coding with AI agents has become widespread and sophisticated. For newcomers, selecting mature tools such as VS Code paired with GitHub Copilot is recommended for their control and enterprise suitability. Additionally, Mistral Vibe and Gemini CLI are suggested for experimentation within free usage limits, while OpenCode should be approached cautiously due to its limited safety features. Sandboxing is emphasized to safeguard personal data, advocating the use of AI tools from providers like Anthropic or OpenAI within sandboxes instead of costly subscriptions. The principle "Fast, Good, Cheap: pick two" persists, as local AI still cannot match the capabilities of cloud models. To maximize AI assistance in workflows, structured documentation is key; projects should utilize SPEC.md for specifications and SKILL.md for coding guidelines to enhance agent accuracy. The PLAN.md loop aids task management by dividing work into focused segments with continuous review and updates. Steering—guiding agents through tests, linting, example-based learning, or model adjustments—is crucial for maintaining output quality. Using strongly typed languages such as Go, Rust, and TypeScript improves the AI's understanding and self-correction capabilities. The author's approach has matured into a reliable mobile agentic assistant with future plans aiming to enable collaborative agent interactions to share context and skills efficiently. Keywords: #phi4, Agentic Development, GitHub Copilot, Language Matters, PLANmd, Privacy, SKILLmd, SPECmd, Sandbox, Security, Steering, Tooling, VS Code, Workflow
    The google logo   taoofmac.com 2 days ago
655.  HN Aiswitch – switch between Claude, OpenAI, Gemini and Copilot accounts in one cmd
Aiswitch is a command-line utility designed to simplify the management of multiple AI accounts across platforms such as Claude, OpenAI, Gemini, and GitHub Copilot by enabling rapid switching with a single command. It supports cross-platform usage on macOS, Linux, and Windows, integrating seamlessly with tools like Cursor, Windsurf, and any terminal application through an interactive TUI for easy profile navigation. Key features include per-project auto-switching using a `.aiswitch` file in repositories, shell integration to update environment variables dynamically, and automatic IDE configuration updates for settings.json in supported environments. Installation can be done via Go with `go install`, by downloading pre-built binaries from GitHub Releases based on the user's OS and architecture, or by building from source through cloning the repository and executing a make command. Post-installation setup involves configuring shell integration using `aiswitch setup` and sourcing the appropriate shell file, followed by adding and switching profiles using commands like `aiswitch add` and `aiswitch use <profile>`. Configuration details include storing profile information in `~/.aiswitch/` with separate configuration (`config.json`) and secrets (`secrets.json`) files. The latter is secured with restrictive permissions (mode 0600) to protect sensitive data, which should not be committed to version control. Future enhancements planned for Aiswitch encompass integration with OS keychains for enhanced secret management, support for additional providers such as Ollama, Azure OpenAI, and AWS Bedrock, and improved shell completion features. Released under the MIT License, Aiswitch aims to streamline AI account management efficiently across diverse development environments. Keywords: #phi4, API keys, IDE integration, accounts, aiswitch, command, cross-platform, environment variables, multi-account, per-project configuration, profiles, secrets management, shell integration, version switcher
    The google logo   github.com 2 days ago
658.  HN FastFlowLM Docker – Run LLMs on AMD Ryzen AI NPU (Linux)
"FastFlowLM Docker" is a project designed to enable running large language models (LLMs) on AMD Ryzen AI NPUs using Linux within a Docker environment. Developed by Claude Opus 4.6 with GitHub Copilot CLI, it addresses the lack of official support for AMD's XDNA2 NPU on Linux by automating the FastFlowLM build process from source code. The project supports any AMD processor equipped with an XDNA2 NPU, such as the Ryzen AI 9 HX series, and requires a specific Linux kernel version alongside AMD’s amdxdna driver and Docker to function. The setup guide provides instructions for installing necessary components on Ubuntu 24.04, including memory limit configurations. Users can build the FastFlowLM Docker image from source and execute various commands within Docker to list available models, download them, run validations or serve LLMs on the NPU. Performance metrics like Time To First Token (TTFT), token generation speed, and model parameters for models such as Qwen3 and Llama 3.2 are provided to evaluate efficiency. The project's workings involve a Dockerfile that includes a build stage with dependencies and source compilation, followed by a runtime stage containing essential binaries and libraries. NPU access is achieved using `--device=/dev/accel/accel0`, facilitating communication through the amdxdna driver. Additionally, troubleshooting tips are provided for common issues like missing NPUs or permission errors. Distributed under the MIT license, "FastFlowLM Docker" utilizes FastFlowLM as its runtime and acknowledges licenses from other components such as the amdxdna driver and AMD XRT. Keywords: #phi4, AMD Ryzen AI NPU, AMD XRT, Boost, Docker, FFTW3, FLM C++ build, FastFlowLM, FastFlowLM#381, Linux, Llama 32, MIT licensed, OpenAI-compatible API server, Phi-4 Mini, Qwen3, Rust compilation, TTFT, XDNA2 NPU, XRT headers, Xilinx Runtime, amd/RyzenAI-SW, amdxdna driver, benchmarks, cmake, flm list, memlock, ninja, onnxruntime_providers_ryzenaiso, runtime dependencies, tokens/s
    The google logo   github.com 2 days ago
670.  HN Show HN: Forgiven – Emacs and Vim Reborn
"Forgiven v0.5.0-alpha.1" is an innovative terminal-based AI-first code editor that draws inspiration from both Emacs and Vim, offering a modal editing experience encompassing normal, insert, visual, and command modes. Its key features include integration with GitHub Copilot for inline completions and chat functionalities, advanced navigation tools, buffer management, and file exploration capabilities. Additionally, it provides robust Git support, including commit generation and markdown preview caching, while also supporting syntax highlighting via a Base16 Ocean Dark theme using syntect. The editor enhances productivity with its debugging panel, performance improvements such as vertical split screen, and integration with tools like lazygit. It features project-wide search functionality through ripgrep and offers markdown rendering capabilities that include Mermaid diagrams. With fuzzy-style buffer/file pickers and inline file/folder management options, Forgiven is designed to handle a variety of development tasks efficiently. Built on the ratatui framework with a crossterm backend, it leverages Tokio for asynchronous runtime operations. The editor focuses heavily on privacy and security, restricting outbound connections solely to GitHub's official endpoints during Copilot usage and ensuring no telemetry or analytics are collected. Development practices include security measures like cargo-audit and code scanning. Currently in alpha development, Forgiven invites user feedback and bug reports, operating under the MIT license. Its project structure is meticulously documented through Architecture Decision Records (ADR). Keywords: #phi4, Emacs, GitHub Copilot, LSP support, Vim, agent panel, file explorer, lazygit integration, markdown preview, modal editing, project-wide search, syntax highlighting, terminal editor, undo/redo
    The google logo   github.com 3 days ago
684.  HN Show HN: Think Better – 155 decision-science rules for your AI assistant
"Think Better" is an open-source tool designed to enhance the capabilities of AI assistants by incorporating structured decision-science frameworks, which address the challenge of generic responses to complex queries. The system features 155 organized knowledge records that encompass ten decision frameworks, twelve cognitive biases, ten decomposition methods, and twelve mental models. It utilizes a Python BM25 search engine to classify problems accurately and suggest relevant frameworks while also flagging potential cognitive biases. The tool is intended for local use without the need for API keys or telemetry and supports platforms such as Claude AI, GitHub Copilot, and Antigravity. Users can install "Think Better" into their AI workspace via CLI commands, allowing them to describe problems in plain language and receive structured action plans. Key features include decision classification, framework recommendations, cognitive bias alerts, generation of comparison matrices, and documentation of decisions. The project encourages user feedback on additional frameworks or biases, alternative skill formats, and search methodologies. Installation is straightforward with detailed instructions for Linux/macOS or Windows systems. Users can interact with their AI to obtain specific analysis methods, like binary choice frameworks or issue tree decompositions, thereby improving decision-making efficiency. Overall, "Think Better" transforms vague problems into clear action plans by embedding structured thinking directly into AI interactions, enhancing problem-solving and decision-making capabilities across various contexts. Keywords: #phi4, AI assistant, BM25 search engine, GitHub Copilot, Go CLI, Hypothesis Trees, MECE Profitability Tree, Pre-mortem, Python, Weighted Matrix, cognitive biases, decision science, mental models
    The google logo   github.com 3 days ago
716.  HN Coworking for Punks
"Coworking for Punks" explores the utilization of intelligent agents for non-coding, knowledge-based tasks, presenting alternatives to existing products such as Anthropic's "Cowork." The article advocates for OpenCode Desktop, emphasizing its advantages due to its flexibility and open-source nature. It allows integration with multiple AI models like GPT-5.4, Claude, and Gemini through services including ChatGPT Plus and GitHub Copilot Pro+, offering users more control over their tools without dependence on proprietary servers. The article further highlights the significance of connectors—CLI utilities and agent skills—as essential for integrating these intelligent agents with applications such as Google Workspace, Todoist, Agent Browser, Obsidian, and QMD. These integrations are vital in enhancing productivity within software development tasks by tailoring the setup to meet specific user needs. Moreover, "Coworking for Punks" introduces Elite AI-Assisted Coding as a comprehensive course designed to teach effective utilization of AI agents in software development, currently available at an early bird discount. It also invites readers who are interested in setting up personalized agentic environments or require troubleshooting assistance to participate in free educational sessions like Sunday School. This provides a platform for learning and community engagement within the tech space. Keywords: #phi4, AI models, Agent Browser, Anthropic, CLI utilities, Claude Cowork, Coworking, GPT-54, GitHub Copilot Pro+, Google Workspace, MCP servers, Obsidian, OpenCode Desktop, Punks, QMD, Todoist, Zen Go, agent skills, connectors
    The google logo   everything.intellectronica.net 3 days ago
720.  HN Cursor went from $0 to $29B to existential threat in three years
Cursor, an AI-powered coding tool developed by Anysphere, saw rapid growth from its launch in 2022 to a peak valuation of $29 billion within three years due to its advanced features like autocomplete and natural language editing in a VS Code fork. However, by mid-2025, the emergence of autonomous coding agents capable of executing tasks without continuous human input rendered Cursor's model obsolete, causing a swift decline as developers shifted toward these more efficient tools. This transformation from assisting in code writing to autonomously generating and executing code marked a significant paradigm shift that led Cursor from market dominance to an existential crisis. The case underscores the rapidly shrinking lifecycles of AI-driven products, where groundbreaking innovations can quickly become obsolete within months rather than years. For product builders, this highlights the importance of focusing on durable infrastructure layers such as databases and payment systems that provide long-term stability, in contrast to UI features vulnerable to rapid obsolescence. Cursor's experience serves as a cautionary tale for startups about the risks of over-relying on current AI capabilities without anticipating future technological shifts, emphasizing the need for strategic adaptability and investment in areas with more enduring relevance amidst fast-paced changes in technology landscapes. Keywords: #phi4, AI, Cursor, autonomous agents, developers, existential threat, funding, infrastructure, innovation, product lifecycle, startup, strategy, technology compression, valuation
    The google logo   www.permissionprotocol.com 3 days ago
761.  HN Microsoft/Hve-Core
HVE Core is a framework designed specifically for GitHub Copilot, aimed at enhancing prompt engineering through constraint-based AI workflows. It serves enterprise environments by facilitating efficient management of AI-driven tasks for both individual developers and large teams. Key components include 34 specialized agents, 68 coding instructions, 40 reusable prompts, and 3 skills. The methodology employs the RPI approach—Research, Plan, Implement—emphasizing verified outcomes over mere plausible code. HVE Core is accessible as a VS Code extension or Copilot CLI plugin, with installation taking approximately 30 seconds. Users can quickly start by checking agent availability in GitHub Copilot Chat and experimenting with creating a memory file using the designated memory agent. The framework comprises four main artifact types: Activation Instructions, which are automatically triggered via specific file patterns; Prompts that require manual initiation and include task-specific input variables; Agents, representing specialized personas with constraints accessible through an agent picker; and Skills, which are cross-platform scripts executed on demand. All AI artifacts undergo rigorous validation through CI/CD processes using JSON schema enforcement. The project structure includes directories for agents, instructions, prompts, skills, workflows, documentation, and source scripts, supporting a comprehensive development environment. Open contributions to the framework are encouraged, with guidelines provided in a contributing guide. Microsoft promotes ethical AI practices under its Responsible AI Standard while licensing HVE Core under the MIT License, accompanied by specific security and governance policies. Compliance with Microsoft's trademark usage guidelines is required for using associated trademarks. Keywords: #phi4, AI, AI workflows, Agents, Constraint, Copilot, Core, Design, Engineering, Enterprise-ready, Extension, Framework, GitHub, GitHub Copilot, HVE, HVE Core, Hypervelocity Engineering, JSON, JSON schema, Methodology, Pipeline, Prompt, RPI, RPI methodology, Responsible, Responsible AI Keywords: Hypervelocity, Schema, Specialized, VS Code, VS Code extension, Validation, Workflows, constraint-based design, enterprise-ready framework, prompt engineering, specialized agents, validation pipeline
    The google logo   github.com 3 days ago
771.  HN Superpowers for Claude Code: Complete Guide 2026
"Superpowers for Claude Code: The Complete 2026 Guide" presents an open-source framework that revolutionizes AI-driven code generation by embedding professional development practices into AI workflows, thereby improving the quality and maintainability of generated code. It features a comprehensive 7-phase workflow incorporating Socratic brainstorming, detailed task planning, Test-Driven Development (TDD), concurrent sub-agent execution, and systematic code reviews. This approach enables deep idea refinement through dialogue and breaks projects into manageable tasks while employing specialized agents to expedite development by three to four times compared to linear methods. By prioritizing test writing before coding, the framework ensures reliability and thorough testing of the code. Additionally, it automates code reviews to ensure adherence to standards and security compliance prior to merging. Available via Claude Code's marketplace or the Anthropic platform since January 2026, installation is straightforward with command verification through `/help`. A real-world application demonstrates its efficacy by building a Notion clone, showcasing tasks like setting up Next.js projects and achieving high test coverage. Compared to alternatives such as Cursor, GitHub Copilot, and Standard Claude Code—each offering varied benefits but lacking structured workflow support—"Superpowers" provides a complete methodology suitable for complex and mission-critical projects. Ideal for teams requiring rigorous methodologies like TDD and Agile or those developing production-ready applications with clear architectures, the framework does require initial investment in brainstorming and planning. Developed by the community rather than officially supported by Anthropic, it is recognized for its quality and promises ongoing evolution through new skills and integrations. Ultimately, "Superpowers" significantly enhances Claude Code's capabilities, offering a disciplined approach to AI-assisted software development for complex and reliable project needs. Keywords: #phi4, AI development, Anthropic marketplace, Claude Code, FAQs, Git worktrees, GitHub stars, IDE integration, Socratic brainstorming, Superpowers, TDD cycle, Test-Driven Development (TDD), brainstorming, code review, code review Final Comma-separated List: Superpowers, collaboration skills, community support Comma-separated Keywords: Superpowers, community support Extracted Keywords: Superpowers, community support Final Keywords: Superpowers, community support Final List: Superpowers, community support Keywords: Superpowers, community support Selected Keywords: Superpowers, comparison, debugging skills, development philosophy, enterprise quality, error handling, execution, limitations, micro-task planning, open-source framework, parallel development, planning, professional methodology, skill creation tools, software methodologies, sub-agent-driven development, supported platforms, testing skills, workflow
    The google logo   www.pasqualepillitteri.it 3 days ago
803.  HN Show HN: Apc-CLI – sync AI memory across Claude Code, Cursor, Copilot
APC-CLI is a synchronization tool aimed at harmonizing the contexts of various AI coding tools across multiple platforms such as Claude Code, Cursor, Copilot, Gemini CLI, Windsurf, and OpenClaw. It addresses challenges related to different storage locations and formats for skills, MCP servers, memory, and API keys used by these diverse tools, which complicates switching between them or setting up new systems. The tool offers three core commands: `apc collect` to gather data from installed tools, `apc status` to report synchronization states, and `apc sync` to distribute collected data across configured AI tools, all while managing secrets securely using the OS keychain without requiring cloud accounts. APC-CLI supports offline operation, resolves conflicts intelligently, and tracks changes through manifests to prevent accidental overwrites. It allows users to install reusable skills from GitHub and set up LLM providers for memory synchronization. Available under the MIT license, installation options include pip or direct script execution, along with an interactive setup wizard and a detailed command reference. The tool centralizes configurations into a local cache (located at ~/.apc/) using JSON files to store skill details, MCP server configurations, and memory entries, ensuring that secrets are redacted and securely stored. This centralized management facilitates a consistent experience across different AI tools by maintaining a unified format locally before syncing to each tool's native formats. For developers, APC-CLI supports integration with various LLM providers like Anthropic, OpenAI, Google Gemini, among others, offering both interactive and non-interactive setup options. The development process includes open contributions through issues and pull requests, code linting, formatting using ruff, and conducting integration tests with Docker. Keywords: #phi4, AI tools, API keys, CLI, LLM, MCP servers, MIT license, MIT license Keywords: AI tools, MIT licenseExtracted Keywords: AI tools, apc-cli, configuration, conflict resolution, context, contributing, development, export/import, installation, local cache, manifest tracking, memory, multi-tool sync, offline-first, skills, sync
    The google logo   github.com 3 days ago
841.  HN AI-Powered F1 Predictions
The author delves into utilizing AI models for forecasting Formula 1 outcomes as part of an annual, non-competitive prediction tournament. Utilizing advanced tools like GitHub CoPilot Enterprise and Google Gemini Pro, the objective is to contrast human predictions against those from AI models developed by Google (Gemini 3.1 Pro), Anthropic (Claude Opus 4.6), and OpenAI (GPT-5.3-Codex) for the 2026 F1 season. For the initial Melbourne race, each model receives identical data on drivers Lindblad, Piastri, Perez, and Bottas to predict their finishing positions and determine which driver is most likely to advance. Despite slight variations, all models generally agree that Cadillac will perform well, with none predicting a local favorite as the winner. Gemini highlights that Constructors' Champions lack pace advantage compared to the previous year. The author uses Gemini’s analysis for betting on the Australian Grand Prix and the entire season with hypothetical funds, focusing on Mercedes and Ferrari due to perceived testing advantages. Future plans include publishing race weekend results alongside AI predictions and betting outcomes, maintaining a balance between experimentation and enjoyment. Keywords: #phi4, AI-Powered Predictions, Anthropic Claude, BTRFS, Bazzite, Betting Markets, Constructors' Championship, Drivers, Drivers' Championship, Ferrari, Formula 1, Free Practice, GPT-53-Codex, Generative AI, GitHub CoPilot CLI, Google Gemini, McLaren, Mercedes, OpenClaw, Overtakes, Predictions Tournament, Red Bull
    The google logo   danielfinch.co.uk 4 days ago
864.  HN AI Engineer will be the LAST job
The text explores the evolving role of artificial intelligence (AI) in white-collar professions, particularly focusing on software engineering, where there are growing concerns about job displacement as AI capabilities expand. This situation is likened to a Jevons Paradox scenario, where AI tools automate entire jobs rather than just tasks. Despite these advancements, it's anticipated that the role of "AI Engineer" will persist, essential for developing and refining AI systems. By 2026, knowledge work agents—software coding agents with additional skills—are expected to dominate professional fields due to their improved ability to handle traditional white-collar tasks. Recent developments in AI models such as OpenAI's GPT-5.4 are highlighted, noting both performance improvements over earlier versions and increased costs. Community benchmarks reveal mixed results regarding efficiency when compared to other models like Claude. Security implications arise as more capable AI systems excel at discovering vulnerabilities and developing exploits; initiatives like OpenAI's Codex Security program aim to mitigate these risks by identifying and addressing software vulnerabilities. The text also discusses advancements in inference and kernel engineering, which seek to optimize model performance across different hardware platforms, thus enhancing computational efficiency. Additionally, there is a focus on specialized AI models and techniques designed to improve training data efficiency, reflecting ongoing innovation in creating task-specific, cost-effective solutions. This includes the application of reinforcement learning and continual adaptation methods to ensure AI systems remain relevant and effective over time. Keywords: #phi4, AI Engineer, AI-induced layoffs, Codex Security, CritPt, Discord, GPT-54, Jevons Paradox, KARL, KernelAgent, Knowledge Work Agents, Latent Space, MCP, Phi-4-reasoning-vision, Software Engineering, vLLM
    The google logo   www.latent.space 4 days ago
867.  HN Building a Project with AI: My Experience with Agentic Development
The author details their journey in using "agentic development" with AI to create a holiday management application called HollyDayz, highlighting how they built the project by leveraging AI tools instead of traditional coding practices. This approach required setting up an environment conducive to AI utilization, primarily through VS Code enhanced by GitHub Copilot, and focused on providing clear context to improve AI outcomes. The author developed specific skills for tasks like creating single-page applications (SPA), deploying via Vercel, and managing databases, which guided the AI's actions in a structured manner. In their development process, they integrated custom agents such as "tech-writer" for documentation and UI testers, facilitating interaction with GitHub Copilot through VS Code Chat and Copilot CLI using predefined skills and context-rich prompts. This setup allowed for seamless integration of AI tools, although it occasionally necessitated clarifications from the developer. Moreover, the author experimented with GitHub Agentic Workflows to automate issue management on GitHub, demonstrating a unique feature of GitHub Copilot that integrates AI into CI/CD processes. The experience underscored the importance of proper environment setup and context provision for successful agentic development, shifting developers' roles toward decision-making and strategic direction rather than manual coding. This method leverages AI for routine tasks while maintaining necessary human oversight. The author concludes by encouraging other developers to experiment with this approach on smaller projects to explore its potential benefits. They also provide references for further exploration into the tools and methods employed in their project, inviting readers to delve deeper into agentic development practices. Keywords: #phi4, AI, Agentic Development, Automation, CI/CD, Coding Agent, Context, Custom Agents, Deployment, Developer, Documentation, GitHub Actions, GitHub Copilot, LLMs, MCP Tools, Prompting, Reactjs, SPA, Setup, Skills, Software Development Process, VS Code, Workflow
    The google logo   swedq.se 4 days ago
897.  HN AI Tooling for Software Engineers in 2026
As of 2026, the use of AI tools among software engineers has become deeply integrated into their workflows, with nearly all surveyed respondents employing these technologies on a weekly basis and over half for at least half of their tasks. Claude Code emerges as the leading tool, rapidly gaining popularity since its release in May 2025, especially within smaller companies and among senior leadership. The landscape reflects diversity in tool usage, where most engineers employ two to four tools concurrently, with notable growth seen in OpenAI’s Codex and emerging alternatives like Gemini CLI and Antigravity. Anthropic's Opus and Sonnet models dominate the scene for coding tasks, often being the default choice provided by companies. AI agents are increasingly utilized for functions such as code review, bug fixing, and task automation, with regular users displaying more favorable perceptions of AI technologies. The adoption patterns vary significantly across company sizes; smaller firms lean towards Claude Code while larger enterprises prefer GitHub Copilot due to procurement strategies. Engineer preferences reveal a strong inclination towards Claude Code, particularly among senior engineers, who express higher satisfaction compared to other tools like Cursor. This survey encompasses experienced professionals from the US and Europe, highlighting a balanced distribution in terms of company size. Overall, these findings illustrate a dynamic AI tooling environment within software engineering, driven by mainstream adoption and influenced by organizational scale and role seniority. Keywords: #phi4, AI agents, AI market, AI models, AI tools, AI trends, Anthropic, Antigravity, Claude Code, Codex, Gemini CLI, GitHub Copilot, OpenCode, Opus, SonnetKeywords: AI tools, agent usage, company size, demographics, engineering work, mainstream adoption, software engineers, survey findings, tool preference, tool usage
    The google logo   newsletter.pragmaticengineer.com 4 days ago
898.  HN Video Helper – open-source tool to extract mind maps and summaries from videos
Video Helper is an innovative open-source tool designed to optimize video learning through AI-powered enhancements. By allowing users to input videos via links or uploads, it automatically extracts key information into structured Mind Maps and summaries using sophisticated language model pipelines. The tool's standout features include Smart Pipeline Analysis for automated processing of video content, a Dynamic Mind Map offering interactive knowledge structures that can be customized, and Bi-directional Interaction which facilitates seamless navigation between mind maps, content modules, and specific video timestamps. Additionally, it supports AI Q&A functionality for in-depth context-based dialogue and offers a Quiz Canvas with AI-generated questions to reinforce learning through practice and feedback. Built on a Monorepo architecture, Video Helper integrates Next.js for the frontend, FastAPI for the backend, Python programming, and SQLite with SQLAlchemy for data management. It provides flexible deployment options: users can download a pre-built client, utilize Docker-based server deployment, or build from the source code if they are developers. To get started, users have several paths, including downloading a ready-to-use client, deploying through Docker, or building the tool from source. Furthermore, Video Helper can be integrated as an AI skill in editors like Claude Code and GitHub Copilot without needing backend LLM configuration. The project is community-driven, open to contributions under an MIT license, emphasizing scalability and efficient code maintenance. Keywords: #phi4, AI-powered, Alembic, Bilibili, Docker, Electron, FFmpeg, FastAPI, GitHub Copilot, LLM analysis, Monorepo architecture, Nextjs, Open Source CommunityKeywords: Video Helper, ReactFlow, SQLAlchemy, SQLite, Tiptap, Video Helper, Whisper, YouTube, interactive linkage, mind maps, multi-turn Q&A, quiz canvas, summaries, uv, video learning
    The google logo   github.com 4 days ago
   https://github.com/LDJ-creat/video-helper   4 days ago
917.  HN Better-CLI: A Skill that teaches agents best practices for improving CLIs
Better-CLI Skill is designed to enhance Command Line Interfaces (CLIs) by embedding best practices that cater to both human users and AI automation pipelines, with installation options across various platforms such as Claude Code, ClawHub, npm, GitHub Copilot, among others. The skill emphasizes guided output by directing commands to ensure a clear distinction between standard data outputs (stdout) and error messages (stderr). It promotes structured data through machine-readable formats like `--json`, enhancing automation capabilities. Detailed actionable errors are included in the design, providing error codes, solutions, and retry hints for better troubleshooting. The CLI is designed to be non-interactive with bypass options available for every prompt, ensuring usability without interactive requirements. Additionally, Better-CLI includes TTY awareness to adapt outputs based on different environments like terminals or pipes. The primary goal of Better-CLI is to ensure AI agents can interpret CLI command outputs unambiguously, improving efficiency in automation tasks. It supports a range of agent platforms with comprehensive manifests and focuses on core principles such as output guidance, error handling, interactivity management, composability, discoverability, security considerations, and rigorous testing protocols. Target audiences for Better-CLI include AI agents engaged in developing CLI tools, developers aiming to create CLIs that are accessible to both humans and AI without sacrificing user experience, and teams seeking to standardize CLI design patterns across projects. The skill is specifically intended for command-based CLIs with structured outputs, excluding full-screen TUI applications, interactive dashboards, or GUI applications, and it operates under the Apache-2.0 license. Keywords: #phi4, AI agents, Apache-20, Better-CLI, CLI tools, CLIs, JSON envelopes, Skill, TTY-aware, actionable errors, best practices, checklist, command-based, decision tree, error handling, installation, interactivity, manifests, platforms, publishing, security, structured output, testing
    The google logo   github.com 4 days ago
   https://github.com/yogin16/better-cli   4 days ago
   https://github.com/lorelang/lore   4 days ago
   https://github.com/googleworkspace/cli   4 days ago
   https://github.com/googleworkspace/cli/pull/2   4 days ago
991.  HN Show HN: Agent Office – Slack for (OpenClaw Like) AI Agents
Agent Office emerges as an innovative workspace manager designed to streamline the orchestration of AI coding agents, drawing parallels with popular platforms like Slack. Utilizing Raspberry Pi hardware and optionally Docker for enhanced isolation, it introduces a range of features aimed at optimizing task management and inter-agent communication. Central to its functionality is a tick-based scheduling system that efficiently manages agent tasks using priority queues and inter-process communication (IPC). This ensures seamless coordination among agents while maintaining robust file access control through cross-agent file sharing capabilities. Additionally, the platform supports proactive cron jobs and YAML configurations for streamlined setup processes. For various organizational needs, Agent Office offers flexible setups including basic teams, OpenServ teams, or feature teams integrated with Kanban boards. Installation is straightforward, requiring environment variable settings and development commands to initiate a Docker-sandboxed server for secure isolation. The architecture revolves around a YAML configuration file that directs agents managed via command-line interface (CLI) or web-based user interfaces (Web UI). Key components like the Scheduler, MessageBus, TaskService, and CronService play crucial roles in orchestrating workspace operations. Agents can either run in-process or within isolated Docker containers, enhancing security. Security is a cornerstone of Agent Office, with support for OAuth authentication facilitating secure access to model providers without the need for API keys. This feature extends compatibility across various providers such as OpenAI and Anthropic, ensuring flexibility and secure agent interactions. Offices, defined via YAML files, represent teams sharing configurations, environment variables, secrets, cron jobs, tasks, agents, and permissions. The permission system dictates access levels to tools and operations like managing cron jobs, maintaining structured control over workspace activities. The platform excels in task management with a built-in mechanism for scheduling tasks through cron jobs, supporting proactive execution and dependency management akin to Kanban boards. Sandbox modes further enhance security by isolating agents within Docker containers to prevent unauthorized access or privilege escalation. Interaction between sandboxed agents and the host system is facilitated through a comprehensive Host API. This API ensures secure operations with features like secret isolation, request limits, and anti-SQL injection protections, reinforcing the platform's security framework. The document also highlights runtime operations managed via REST API endpoints alongside Web UI controls. Agents can be hired or fired, messages sent, prompts updated, configurations reloaded, and organizational charts displayed through these interfaces. Dynamic model discovery allows users to select from various providers' models efficiently using a REST API endpoint that fetches this data. Execution commands are available both via the Web UI and REST APIs, with additional CLI commands for office creation, validation, and migration operating outside of runtime environments. The security measures include authenticated endpoints requiring session cookies and CSRF headers to ensure secure interactions. Agents utilize defined tools for communication, maintaining a system where outputs remain non-visible to users directly. Task notifications automatically update task creators on status changes like in-progress or completed tasks, ensuring transparency within the workspace. The document further describes prompt systems delivering layered prompts with identity details and custom instructions, managed through versioning and customization options. The scheduler's tick-based mechanism ensures priority execution at regular intervals while sandbox modes provide isolated environments for both offices and individual agents. Skill management involves markdown files that enhance agent functionality, accessible via commands or a Web UI Skills Manager, emphasizing on-demand loading to minimize prompt size. Persistence mechanisms include watchdog systems monitoring heartbeats and SQLite databases ensuring message durability across restarts. Channel management allows seamless communication, with APIs supporting creation, updates, and deletion of channels maintained consistently across sessions. Cost tracking monitors resource usage per agent, providing insights into token consumption over varying periods. The platform's web UI offers real-time interactions through a secure dashboard supported by session cookies for authentication and CSRF protection. Development environments leverage TypeScript and React, requiring Docker for sandbox testing, ensuring feature reliability. Overall, Agent Office provides a comprehensive framework designed to enhance AI coding agent management within team-oriented workspaces, focusing on security, persistence, and efficient collaboration across both in-process and containerized environments. Keywords: #phi4, AI, Agent, Agent Lifecycle, Authentication, CLI, Channel Management, Collaboration, Configuration, Cost Tracking, Cron Jobs, Dependencies, Development, Docker, Environment Variables, File Access, Heartbeat, Heartbeat Monitoring, IPC, Integration, Isolation, Kanban Board, Message Bus, Message Persistence, OAuth, Office Management, Permissions, Project Structure, Prompt Truncation, Proxy, REST API, Sandbox, Sandbox Mode, Scheduler, Secrets Management, Security Model, Session History, Skill Management, Skills, Slack, Task Management, Task Orchestration, Testing, Tools, Watchdog, Watchdog Behavior, Web UI, Workspace, YAML
    The google logo   github.com 4 days ago
1069.  HN Conductor – Scalable Workflow Orchestration Engine for Microservices
Conductor is a scalable workflow orchestration engine specifically designed for microservices architecture, facilitating the creation and execution of complex multi-agent workflows with tools like GitHub Copilot SDK and Anthropic Claude. Unlike traditional systems that rely on single LLM prompts, Conductor offers enhanced capabilities through iterative refinement via evaluator-optimizer loops, supports parallel execution with built-in failure handling mechanisms, and integrates human-in-the-loop interactions for improved workflow management. Key features of Conductor include the ability to define workflows using YAML, compatibility with multiple AI providers such as GitHub Copilot and Anthropic Claude, conditional routing based on predefined criteria, and the implementation of safety measures like maximum iteration limits and timeouts. A web dashboard is provided to enable real-time visualization and monitoring of workflows, ensuring users can track progress and performance efficiently. Conductor can be installed using various methods including uv, pipx, or pip, with flexibility in specifying branches or tags to suit different user needs. The command-line interface (CLI) offers comprehensive commands for running, validating, and initializing workflows, alongside development tools that support testing, linting, and type checking, facilitating a robust development environment. The project actively encourages contributions from the community under a Contributor License Agreement (CLA) and upholds the Microsoft Open Source Code of Conduct to ensure an inclusive and collaborative environment. Conductor is distributed under the MIT license, offering broad usage rights while respecting trademark guidelines, thereby promoting its adoption across diverse applications. Keywords: #phi4, AI Providers, API Key, Anthropic Claude, CLI Tool, Conductor, Contributor License Agreement, Development, Documentation, GitHub Copilot, Human-in-the-loop, Linting, MIT LicenseKeywords: Conductor, Microservices, Microsoft Open Source Code of Conduct, Multi-agent Workflows, Parallel Execution, Python, Safety Limits, Testing, Trademarks, Type Checking, Web Dashboard, Workflow Orchestration, YAML, pip, pipx, uv
    The google logo   github.com 5 days ago
1094.  HN AI Is Writing Your Code. Now It Must Govern Your Architecture
The article explores the evolving role of artificial intelligence (AI) in software development, shifting from mere code generation to influencing software architecture itself. Traditionally, software architectures have adapted according to primary constraints such as hardware limitations initially and later focusing on human comprehension due to increasing system complexity. This evolution has prioritized readability and modularity for effective collaboration among developers. With the advent of AI coding assistants like GitHub Copilot, there is an emerging paradigm where AI is poised to become a predominant code producer. This potential shift necessitates a transformation in software architecture from being primarily designed for human use to one that accommodates AI interaction effectively. To align with AI systems' operational needs, future architectures must be explicit, machine-readable, and formally constrained, marking a departure from conventional approaches centered around human understanding. Consequently, as AI continues to play an increasing role in development processes, it is crucial for architectural frameworks to adapt by integrating elements that facilitate both human oversight and seamless AI integration. This evolution will ensure software systems remain efficient, adaptable, and comprehensible within the new AI-augmented landscape of software engineering. Keywords: #phi4, AI, Architecture, Boilerplate Code, Clean Architecture, Code, Constraints, Cursor IDE, Design Patterns, Evolution, Explicit Structure, Formally Constrained, GitHub Copilot, Hardware Limitations, Hexagonal Architecture, Human Comprehension, Machine-Readable, Refactorings, Software Systems
    The google logo   medium.com 5 days ago
1126.  HN Show HN: Geo-lint – Claude Code skill that auto-fixes SEO/GEO violations in loop
Geo-lint is an open-source tool designed to enhance content quality by focusing on Generative Engine Optimization (GEO), addressing both SEO and GEO-specific challenges through deterministic rules across Markdown and MDX files. It ensures consistent outputs via 92 predefined rules related to SEO, GEO, content quality, and technicality. Geo-lint operates as a Claude Code skill with an autonomous lint-fix loop that independently auto-corrects content by running subagents in parallel on multiple files, iterating up to five times until all issues are resolved. It is particularly tailored for AI search engines like ChatGPT and Perplexity by optimizing content structure, E-E-A-T signals, and citation-ready statistics. To use Geo-lint, users can install it via a command-line script or npm with the command `npm install -D @ijonis/geo-lint`. Configuration is done through a `geo-lint.config.ts` file where site details and content paths are specified. Users can execute various commands for auditing (`/geo-lint audit`), fixing specific files (`/geo-lint fix <slug>`), and more for reporting and setup. Geo-lint supports compatibility with AI agents such as Claude Code, Cursor, and Windsurf, and accommodates different content formats via custom adapters. It integrates seamlessly into CI pipelines and can be employed programmatically through its API. The tool automates the optimization process across multiple sites, ensuring adherence to SEO and GEO best practices, thereby enhancing visibility in AI-driven search engines without requiring manual intervention, providing a comprehensive solution for maintaining high-quality digital content standards. Keywords: #phi4, AI agents, AI search engines, Claude Code, GEO, Generative Engine Optimization, Geo-lint, MDX, Markdown, SEO, content optimization, deterministic rules, lint loop, open-source linter
    The google logo   github.com 5 days ago
1156.  HN Show HN: Making remote MCP servers handle local files and generated artifacts
The Remote MCP Adapter serves as a critical link between client-side operations and remote Model Context Protocol (MCP) servers by addressing challenges related to file accessibility and artifact retrieval when these servers are not locally available. It enables tools that require local files to interact with them remotely through mechanisms like staging client-side files for upstream use and capturing output artifacts for client access. The adapter features a multiserver relay capability, allowing multiple MCP servers to be accessed via a single gateway. Its file handling functionality includes managing uploads and outputs using designated handles, while session management ensures isolation and provides optional "revival" upon reconnection. The adapter supports different state storage backends such as in-memory, SQLite, or Redis and incorporates upstream health monitoring with active checks and circuit breakers to prevent failures. It enhances resilience by automatically retrying and reconnecting when upstream sessions drop. Security is a priority, with authentication handled via bearer tokens and signed upload URLs. Observability features include OpenTelemetry metrics collection and optional log export, ensuring detailed insights into operations. Safe storage practices are implemented through atomic writes, orphan cleanup, and quota enforcement. Integration with various tools like Playwright MCP, GitHub Copilot, and Antigravity is facilitated by adding configuration entries in their respective config files. Users can set up the adapter using Docker Compose or build it from source with Python 3.12+ and uv. Comprehensive documentation covers setup, configuration, security, telemetry, and troubleshooting aspects. The adapter is freely available under an MIT license at its GitHub repository. Keywords: #phi4, Antigravity, Docker Compose, GitHub Copilot, MCP, MIT license, MkDocs documentation, OpenTelemetry, Playwright, Python 312+, adapter, artifact_producer, artifacts, atomic writes, authentication, bearer tokens, circuit breaker, configuration, configyaml, file outputs, file uploads, health checks, healthz, local files, metrics, observability, quota limits, regex, remote server, resilience, retry mechanism, session isolation, sessions, staging, state backends, telemetry, upload handles, upload_consumer, uv
    The google logo   github.com 5 days ago
1166.  HN AI Tooling for Software Engineers in 2026
The 2026 AI tooling survey among software engineers highlights significant trends and preferences in the utilization of artificial intelligence within the field. Claude Code has quickly become the most popular AI coding tool, overtaking established competitors like GitHub Copilot and Cursor within eight months since its launch in May 2025. The widespread adoption of AI tools is evident, with 95% of respondents using them weekly, and about 75% relying on these tools for at least half their tasks, signifying a deep integration into daily workflows. The survey reveals distinct usage patterns based on company size and leadership roles; Claude Code is particularly favored in smaller companies and by senior leaders. In contrast, GitHub Copilot remains prevalent among larger enterprises due to robust enterprise marketing from Microsoft, while Cursor maintains growth despite competition from newer tools like OpenAI’s Codex, Gemini CLI, and Antigravity. Anthropic's Opus and Sonnet models are preferred for coding tasks, indicating a strong preference for these specific AI models. The use of AI agents is also on the rise, with 55% of respondents regularly employing them to enhance code review, task automation, and debugging processes. Tool preferences are notably influenced by company size, as smaller companies show a predilection towards Claude Code and Codex, while larger organizations continue to prefer GitHub Copilot. Among engineers, Claude Code is most cherished, particularly at senior levels, followed by Cursor. Other tools such as Warp, Zed, Amp, Cline, RooCode, and Continue.dev are valued for their innovative features. The survey's demographic composition included a diverse set of respondents from the US and Europe with varied years of experience and company sizes. In summary, AI tool usage is becoming an integral part of software engineering, with Claude Code leading current trends due to its rapid rise in popularity, while GitHub Copilot retains significant influence within larger organizations. The increasing adoption rates suggest that these tools are now crucial components of the industry's operational landscape. Keywords: #phi4, AI agents, AI market, AI models, AI tools, AI trends, Anthropic, Antigravity, Claude Code, Codex, Gemini CLI, GitHub Copilot, OpenCode, Opus, SonnetKeywords: AI tools, agent usage, company size, demographics, engineering work, mainstream adoption, software engineers, survey findings, tool preference, tool usage
    The google logo   newsletter.pragmaticengineer.com 5 days ago
1170.  HN Awesome Agent Harness Engineering
Agent harness engineering is a process that focuses on creating environments, constraints, and feedback mechanisms to ensure the scalability and reliability of AI coding agents. This involves constructing an infrastructure around a Large Language Model (LLM) agent, encompassing session management, tool design, architectural enforcement, failure recovery, and human oversight. The primary focus for engineers in this field is environment design rather than direct code writing. Information that remains undocumented is not accessible to the agents, as repositories serve as the official system of record. Agent configurations are streamlined with details centralized in an AGENTS.md file, while architecture is enforced through automated tools such as linters and continuous integration checks instead of manual reviews. A key consideration is prioritizing code readability for AI agents over human readability. The ecosystem supporting agent harness engineering includes a variety of tools and frameworks that cover the entire lifecycle from full platform solutions to specific coding agents and standards protocols. These tools facilitate parallel execution, manage issue-to-pull request workflows, enhance context discovery, provide persistent capabilities, and support specification generation for AI agents. Seminal references in this field include OpenAI's experience in building substantial codebases with minimal human intervention and Anthropic’s approach of using progressive disclosure and expressive tools to design effective agent environments. The document encourages contributions to expand the list of resources and tools pertinent to agent harness engineering. Keywords: #phi4, ACP, AI Coding, Agent Harness, Agent-First World Keywords: Agent Harness, Anthropic, Claude Code, Codex, Engineering, Feedback Loops, Frameworks, Harness Engineering, Infrastructure, LLM Agents, MCP, OpenAI, Orchestrators, Progressive Disclosure, Protocols, Repository Knowledge, Runtimes, Session Management, Specifications, Standards, Task Runners, Tool Design
    The google logo   github.com 5 days ago
1184.  HN Show HN: Zsh helpers for LLM Git diff review
The document outlines Zsh helper functions named `claudiff` and `copdiff`, designed to enhance Git diff reviews by integrating AI models like Claude Code CLI and GitHub Copilot CLI. These functions automate the process of piping specified ranges of Git diffs into these AI tools for various code review tasks, including examining specific commits, uncommitted changes, staged modifications, pull requests, and updates since the last tag. The workflow involves checking out a branch, selecting an appropriate Git diff range, capturing this output in temporary files, passing it to the AI tool in "Ask" mode with context access, and subsequently cleaning up the temporary files. To install these functions, users need to add `claudiff` or `copdiff` definitions into their `.zshrc` file based on the preferred AI model. Each function requires specifying a Git diff range and a review prompt; it then creates a temporary file containing the diff, feeds this data into the CLI tool, and removes the file after the analysis is complete. The document provides example prompts for different types of code reviews such as generating commit messages, conducting security analyses, assessing architectural impacts, identifying testing requirements, among others. It also includes various expressions to help users define suitable Git diff ranges for review. Licensed under MIT, these tools aim to streamline and enhance the efficiency of AI-assisted code reviews. Keywords: #phi4, Architecture, Audit, CLI, Code quality, Commit, Diff, Feature branch, Git, LLM, Merge, Observability, Onboarding, Performance, Post-rebase, Pre-merge, Pull request, Rebase, Refactoring, Review, Risk, Security, Staged changes, Testing, Uncommitted changes, Zsh
    The google logo   github.com 5 days ago
1271.  HN The Rise of the Financial Engineer
By 2026, the automation of coding tasks by AI tools such as Claude Code is reshaping software engineering, shifting focus toward tackling more complex issues like developing revenue generation systems. This transition has given rise to a new field emphasizing pricing, metering, and billing infrastructure, leading to the emergence of "Financial Engineers." These professionals are domain experts specializing in monetization strategies rather than broad generalists. The demand for Financial Engineers is driven by four critical forces: the significant cost implications associated with AI interactions making engineering decisions financially consequential; dynamic cost structures that require agile adaptation due to frequent changes in model pricing and usage; outdated traditional monetization systems struggling to keep pace with rapid AI product evolution, necessitating modernized infrastructure; and the need for sophisticated tools to manage complex cost structures within diverse customer organizations. Companies like OpenAI and Anthropic have responded by forming dedicated financial engineering teams tasked with overseeing the entire lifecycle of software monetization. This includes managing entitlements, metering, pricing architecture, billing integration, and usage governance. The accompanying newsletter aims to offer in-depth technical insights into constructing a modern SaaS monetization framework, providing valuable guidance for engineers and leaders facing these new challenges. Keywords: #phi4, AI Agents, AI Tools, API Calls, AWS Cost Explorer, Anthropic, Billing Engineers, Billing Integration, Credit Systems, Domain Experts, Enterprise Scale, Entitlements, Financial Automation, Financial Engineering, Financial Stack, Generalist Engineer, Gross Margin, Marginal Cost, Metering, Monetization, Monetization Infrastructure, NetSuite, OpenAI, Payments, Pricing & Packaging, Pricing Models, Revenue Infrastructure, Revenue Recognition, SaaS, Stigg, Usage Governance
    The google logo   thefinancialengineer.substack.com 6 days ago
1288.  HN GitHub Copilot is now #3 in VS Code installs behind Claude/OpenAI
GitHub Copilot has emerged as the third most installed extension for Visual Studio Code, trailing behind extensions from Claude and OpenAI. Despite its popularity, users face an obstacle due to JavaScript being disabled on their browsers, which hinders access to additional features or content on x.com. To resolve this issue, it is recommended that users enable JavaScript in their browser settings or switch to a supported browser as detailed in the Help Center, ensuring full functionality and accessibility of the platform's offerings. Keywords: #phi4, Claude, GitHub Copilot, Help Center, JavaScript, OpenAI, VS Code, browser, enabled, installs, supported browsers, technical keywords, topic Keywords: GitHub Copilot, xcom
    The google logo   twitter.com 6 days ago
1327.  HN What VSCode type IDE to use to avail of open source models for code gen / comp
The user is exploring cost-effective alternatives to GitHub Copilot for code completion and generation within Visual Studio Code, due to the latter's tendency to deplete credits quickly. They are interested in integrating open-source models like Ollama into VSCode to achieve similar functionalities without incurring significant costs. Additionally, they seek recommendations on alternative IDEs that provide comparable features at a lower price point or free of charge. As options in this area continue to evolve rapidly, the user requests guidance on current best practices and tools for configuring their development environment effectively with these open-source solutions. Keywords: #phi4, GitHub Copilot, IDEs, SOTA (State of the Art), VSCode, code completion, code generation, configuration, credits, ollama type models, open source models, options, space tracking
    The google logo   news.ycombinator.com 6 days ago
1367.  HN Engineering Guide for AI Enterprise Coding Tools
This guide serves as a comprehensive resource for platform engineers tasked with evaluating AI coding tools suitable for enterprise environments. It emphasizes critical evaluation criteria such as security, compliance, codebase intelligence, team adoption, workflow models, and integration depth. Among the reviewed tools are GitHub Copilot, Claude Code, Cursor, Tabnine, Amazon Q Developer, Qodo, Windsurf, and Google Antigravity, with notable mentions of Tabnine and Windsurf for their superior privacy features and adherence to government compliance standards. The guide addresses challenges such as integrating AI into legacy systems where codebase intelligence may be inconsistent across different tools. It highlights the importance of enhancing team collaboration through AI tools rather than replacing individual expertise, stressing that effective adoption requires careful consideration of governance and workflow integration. Tools like Qodo are recognized for their robust workflow models, although ease of integration varies among platforms. Additionally, the guide advises platform engineers to set realistic expectations about productivity improvements from AI tools with leadership and manage developer concerns regarding job security. It recommends a strategic approach to tool selection based on specific workflow requirements, starting with fundamental features such as autocomplete and progressively expanding capabilities. To mitigate resistance from developers, it suggests strategies like clear communication, piloting tools among skeptics, and leveraging peer adoption. Ultimately, the guide underscores the importance of aligning AI coding tool choices with both technical needs and organizational objectives, ensuring a comprehensive assessment of all pertinent factors to facilitate successful implementation within enterprises. Keywords: #phi4, AI coding tools, Amazon Q, Claude Code, Cursor, GitHub Copilot, QA processes, SOC compliance, Tabnine, codebase intelligence, compliance, developer resistance, enterprise, governance, integration depth, job security, pilot testing, platform engineers, productivity, security, team adoption, tooling strategy, workflow model
    The google logo   qa.tech 6 days ago
1375.  HN Field notes from the circus of corporate AI adoption
Over a two-year period, the company observed during its journey with AI adoption experienced initial enthusiasm driven by corporate hype and fear of missing out (FOMO), which led to the establishment of an official AI strategy. However, this translated into ineffective initiatives such as the "Prompt-a-Thon," where teams struggled to find meaningful use cases for AI due to inadequate understanding and resources. This misalignment was further exemplified when a team used unapproved AI tools because IT policies were more budget-driven than innovation-oriented. The company’s approach was also evident during an executive meeting with a hyperscaler company, which prioritized flashy presentations over substantial discussions on AI's actual potential. The culmination of these issues occurred in an "AI Strategy Workshop," where poorly articulated ideas and misaligned visions highlighted the gap between leadership’s aspirations for AI and its practical implementation. Despite recognizing that genuine AI solutions demand careful development and integration, the company continued to focus on hype-driven adoption aimed at external validation rather than achieving real utility. This pattern underscored a criticism of corporate AI initiatives that prioritize spectacle over meaningful application, often neglecting valuable use cases requiring careful consideration to truly benefit organizations. Keywords: #phi4, AI adoption, Claude Code, GitHub Copilot, Hyperscaler X, IT department, LLM products, Prompt-a-Thon, agentic AI, bespoke solutions, corporate AI, executive meeting, hype, implementation, innovation, misuse, post-it notes, productivity, strategy, technical architect, voting process, workshop
    The google logo   mildlyverbose.mataroa.blog 6 days ago
1405.  HN Show HN: The Playwright GitHub Repositories Worth Studying
The article provides comprehensive guidance on effectively utilizing Playwright for end-to-end testing in web applications, focusing on common challenges developers encounter when setting up tests, such as failures in CI/CD environments and cluttered folder structures. It emphasizes the value of studying well-organized Playwright GitHub repositories to develop robust test automation frameworks. Key points include understanding initial challenges with Playwright, such as difficulties in maintaining project structure and ensuring consistent performance across different environments. The article highlights the importance of exploring these repositories for insights into best practices, architectural decisions, and scalable designs through real-world examples, CI/CD pipelines, and production-ready setups. The guide categorizes various Playwright GitHub repositories by language (TypeScript, Python, Java) and use case, recommending specific ones like Microsoft/playwright for TypeScript, playwright-python for Python developers, and microsoft/playwright-java for Java users. For beginners, it advises starting with simple JavaScript examples before progressing to TypeScript, while also suggesting video courses linked to particular Git branches for step-by-step learning. Beyond core Playwright tools, the article points out an ecosystem that includes resources for accessibility checks, performance monitoring, code quality, IDE support, and utility libraries. To effectively leverage these repositories, it advises evaluating them by examining maintenance status, structure, and configuration practices before use. This process involves checking the last commit date, Playwright version in `package.json`, unresolved issues, and configuration files like `playwright.config.ts` to ensure they employ best practices such as using environment variables instead of hardcoded URLs and maintaining structured folders. The article provides a methodical approach for utilizing these repositories: evaluating them before cloning by reviewing their maintenance status; cloning the repository, running tests, and breaking components to understand functionality; thoroughly analyzing configuration files for best practices like enabling retries only in CI and parallel execution configurations; and adapting elements from the repositories rather than copying them wholesale. The conclusion stresses that learning from Playwright GitHub repositories can greatly enhance automation skills by offering insights into real-world framework setups. Microsoft/playwright is particularly recommended for beginners due to its official patterns, while playwright-videos provides step-by-step guidance. While TypeScript is preferred for type safety and alignment with Playwright's design, JavaScript remains suitable for novices. Compared to Puppeteer, Playwright repositories offer a richer ecosystem of scalable test automation frameworks. Keywords: #phi4, AI Integration, Accessibility, Automation, BDD, Beginner-Friendly, Best Practices, Browser Automation, CI/CD, Code Quality, Community, Configuration, Core Web Vitals, Coverage Reports, Cucumber, Documentation, ESLint, Ecosystem, Enterprise-Ready, Feature Files, Fixtures, Framework, Gherkin Syntax, GitHub, IDE Support, Java, Kubernetes, Learning, Page Object Model, Parallel Execution, Performance, Playwright, Playwright Skill, Plugins, Python, Real-World Examples, Reporting, Repositories, Scalability, Test Automation, Testing, Tools, Trace Viewer, TypeScript, Utility Libraries, Video Course, WCAG Compliance
    The google logo   testdino.com 6 days ago
1419.  HN GitHub Copilot Goldeneye model preview
GitHub Copilot enhances its functionality by integrating a diverse array of AI models from multiple providers. These include OpenAI's GPT series (GPT-4.1, GPT-5.0 variants) supported through GitHub and Azure infrastructure; Anthropic's Claude models running on AWS, Anthropic PBC, and Google Cloud Platform; Google's Gemini models hosted by Google Cloud; and xAI's Grok Code Fast 1 model. Each provider maintains strict data handling policies: OpenAI and Amazon ensure no customer data is used for training or retained, while Anthropic's data management depends on feature availability. Similarly, Google Cloud does not utilize GitHub data for training purposes. xAI follows a zero data retention API policy. All models are equipped with content filtering to prevent harmful material dissemination and handle public code matches securely. To enhance service quality and reduce latency, GitHub uses prompt caching across these providers. Each provider adheres to specific commitments concerning user privacy and data protection, ensuring a high standard of data security throughout the ecosystem. Keywords: #phi4, AI models, AWS models, Amazon Bedrock, Anthropic PBC, Azure infrastructure, Claude Haiku 45, Codex, GPT-41, GPT-5 mini, Gemini 25 Pro, GitHub Copilot, Goldeneye, Google Cloud Platform, Grok Code Fast 1, OpenAI, Raptor mini, content filtering, data retention, enterprise privacy, harmful content, prompt caching, public code matching, service terms, xAI, zero data retention agreement
    The google logo   docs.github.com 6 days ago
1466.  HN Copilot Memory now on by default for Pro and Pro+ users in public preview
GitHub Copilot has introduced a new feature called Copilot Memory for its Pro and Pro+ users during a public preview phase. This feature is designed to enhance productivity by allowing Copilot to maintain a comprehensive understanding of the entire codebase at the repository level, which minimizes the necessity to repeatedly provide context. By retaining information about coding conventions, architectural patterns, and dependencies specific to each repository, Copilot Memory ensures that data remains up-to-date through an automatic expiration policy set for 28 days. The enhancement brought by Copilot Memory extends across multiple functionalities. It provides contextual support during task implementation and pull requests, augments code review feedback using recognized patterns, and integrates this awareness into terminal workflows via the Copilot CLI. The shared memory system allows knowledge acquired in one context to be effectively utilized across different tasks. For individual users on Pro or Pro+ plans, access to this feature is automatic but can be opted out of through personal settings. At an organizational level, enterprise administrators have control over memory access, while repository owners are empowered to manage stored memories via their respective repository's settings. Additional information and discussions on this feature are available in specified resources. Keywords: #phi4, CLI workflow, Copilot Memory, GitHub Copilot Pro, architectural patterns, automatic expiration, code review, coding agent, coding conventions, cross-file dependencies, enterprise policies, persistent knowledge, public preview, repository settings, repository settings Keywords: GitHub Copilot Pro, repository-level, repository-level understanding
    The google logo   github.blog 6 days ago
1523.  HN What AI Safety Means to Me
The text addresses concerns within tech companies about the rapid adoption of AI technologies like GitHub Copilot, which are perceived as overdue advancements. The author introduces the concept of "Safe AI" to describe a balance that maximizes societal benefits from superintelligence while avoiding excessive reliance that could lead to cognitive decline. Achieving this equilibrium is deemed crucial through comprehensive education at all levels. Furthermore, the author expresses an intention to develop these ideas into a full essay and encourages readers to stay informed about future updates via RSS feed or Substack. This summary encapsulates the main themes of concern regarding AI adoption, the definition and importance of "Safe AI," educational strategies for balance, and the author's plans for expanding on these topics. Keywords: #phi4, AI Safety, Cognitive Decline, Delicate Balance, Education, Enterprise, GitHub Copilot, Greenfield Startup, Integration, Productivity, RSS Feed, Substack, Superintelligence, Technology Adoption
    The google logo   olshansky.info 7 days ago
1565.  HN With a 5x increase in Show HN, who sees what you build?
Over the past three years, Hacker News (HN), a platform hosted by Y Combinator, has seen a significant increase in "Show HN" posts, with numbers nearly quintupling and an additional 230% rise within just the last three months. Despite this surge in submissions, user growth on HN remains stagnant, leading to a slight decline in overall traffic. This paradoxical trend underscores the challenge new software developers face in gaining visibility despite improvements in creating credible products aided by advancements such as AI code generation tools like GitHub Copilot. While developers maintain confidence in the quality and value of their creations, they struggle to capture attention on HN due to a saturated environment where posts typically receive minimal engagement, evidenced by stagnant median upvote counts. This situation highlights the critical need for human endorsements that can effectively draw user interest in an increasingly crowded digital landscape. Keywords: #phi4, AI code generation, Algolia search API, GitHub Copilot, Hacker News, MVPs, Paul Graham, Sam Altman, Show HN, SimilarWeb, SimilarWebExtracted Keywords: Show HN, SimilarWebKeywords: Show HN, Y Combinator, data analysis, exposure, feedback, human attention, product release, prototypes, software building, startups, tech news aggregator, traction, upvotes
    The google logo   www.quantable.com 7 days ago
   https://news.ycombinator.com/item?id=47045804   7 days ago
1640.  HN APM – Agent Package Manager (Microsoft)
APM (Agent Package Manager) is an open-source dependency manager tailored specifically for AI agents, enabling developers to define necessary components such as skills, prompts, instructions, and tools in a configuration file named `apm.yml`. This ensures uniform agent setups across different team members, operating similarly to other package managers like npm or pip but with a focus on AI configurations. Key features of APM include managing coding standards, AI capabilities (skills), reusable prompts, specialized personas (agents), and lifecycle event handlers (hooks). It integrates seamlessly with popular AI tools such as GitHub Copilot and Claude and supports automatic resolution of transitive dependencies. APM streamlines the development process by allowing new developers to quickly set up a fully configured agent environment through simple commands like `apm install` after cloning a repository. The tool also enables users to create, define, and share packages easily, promoting customization with personal standards or tools in an easy-to-publish format. Installation of APM is user-friendly and can be accomplished via command line scripts, Homebrew, or pip from various sources including GitHub repositories, single files, or Azure DevOps. The project adheres to open standards for AI-native development and provides comprehensive documentation, facilitating its usage and integration with other platforms. This makes APM a robust solution for managing dependencies in AI agent projects while fostering community-driven development and sharing. Keywords: #phi4, AGENTSmd, AI agents, APM, Agent Skills, GitHub Copilot, MCP Servers, dependency manager, instructions, lifecycle event handlers, manifest, prompts, skills, tool integrations, tools, trademarks
    The google logo   github.com 7 days ago
1648.  HN Show HN: I no longer monitor my coding agents, my desktop pet does
SwarmWatch is a desktop application designed to oversee and manage AI coding agents across multiple platforms such as macOS, Windows, Linux, and various IDEs including Cursor, Claude, Cline, GitHub Copilot, and VS Code plugins. It offers users real-time visibility into the activities of these agents through an always-on overlay interface that allows direct approval or rejection of actions. Key features include a bidirectional approval system for coding actions, execution logs to track agent activity, and a unique Tamagotchi-style dog that reacts to user interactions. The application operates locally via localhost communication. The architecture of SwarmWatch is built around a hook system comprising three components: the Runner (a native binary communicating through local WebSocket), Shims (scripts executing the runner with specific agent identities), and the Desktop app developed using Tauri v2, which displays agent states and prompts user approvals. Installation can be done directly using shell commands or PowerShell scripts as per provided documentation. Important considerations for users include adding generated hook files to `.gitignore` to prevent repository clutter, implementing a health probe when the UI is down, and managing an approval waiting time of 60 seconds for actions. Agents are designed to become inactive if no events occur within three minutes. The application emphasizes security by conducting all communications locally, with plans for future authentication additions. Future enhancements aim to expand support for additional agents/IDEs, introduce diverse avatars and reactions, improve the user interface, optimize performance, and integrate light-weight database support. As an open-source project under the MIT license, SwarmWatch invites contributions from developers interested in these advancements. Keywords: #phi4, AI coding swarms, SwarmWatch, WebSocket, activity monitor, agents, approval, control plane, desktop pet, execution logs, hooks, open source, overlay, privacy, real-time view, security
    The google logo   github.com 7 days ago
1676.  HN Show HN: Term-CLI – interactive terminals for AI agents (for SSH/TUI/REPL flows)
Term-CLI is a sophisticated tool designed to facilitate AI agents' interaction with terminal sessions demanding real-time input/output such as SSH sessions, TUIs, REPLs, and debuggers. It enhances the execution of interactive commands by allowing precise keystroke management and prompt-based output handling within these terminals. Key features include in-band file transfer, which enables file movement through channels used for interactions, circumventing traditional methods like SCP/SFTP when they are unavailable. The tool supports human collaboration through Term-assist, enabling humans to assist with credentials and MFA prompts during terminal sessions, effectively bridging the gap between AI automation and manual intervention. Additionally, agents can manage commands within detached tmux-backed sessions that can be accessed by users for manual operations as necessary. This flexibility extends to handling TTY-first workflows that are otherwise difficult to automate non-interactively, such as installers or boot menus. Term-CLI is applicable in a variety of scenarios including running development servers, using debuggers, managing databases, and interacting with professional networking equipment via console access. The installation process requires Python 3.8+ and tmux, with simple setup instructions provided to streamline usage. A notable aspect of Term-CLI is its facilitation of human-AI collaboration, enabling seamless control transitions between AI agents and humans for tasks necessitating manual input, akin to a pair programmer or rubber duck dynamic. Overall, Term-CLI addresses the challenges associated with non-interactive command execution in terminal environments by offering robust error handling, human collaboration capabilities, and integrated file transfer functionalities. Its reliance solely on tmux and Python standard libraries ensures ease of integration without additional dependencies, making it an invaluable resource for complex interactive problem-solving scenarios. Keywords: #phi4, AI agents, REPL, SSH, TUI, command execution, detached sessions, file transfer, human collaboration, interactive terminals, skill integration, term-cli, terminal workflows, tmux
    The google logo   github.com 7 days ago
   https://github.com/microsoft/playwright-cli   7 days ago
1677.  HN Claude Code rolls out a voice mode capability
Anthropic has launched a voice mode feature within Claude Code, an AI coding assistant aimed at enhancing developers' hands-free, conversational workflows. This feature is currently in a gradual rollout phase, available to about 5% of users, with intentions for wider distribution. Users can enable this function by entering `/voice`, allowing them to give spoken commands such as "refactor the authentication middleware." However, specific details regarding limitations and potential third-party collaborations have not been disclosed. Claude Code has established itself as a prominent player in the competitive AI coding assistant market, experiencing significant revenue growth and increased user adoption, partly due to its policy against the military use of AI technology. Keywords: #phi4, AI coding assistant, Anthropic, ChatGPT, Claude Code, Department of Defense, Disrupt 2026, ElevenLabs, GitHub Copilot, Google, OpenAI, TechCrunch, Thariq Shihipar, US App Store charts, Voice Mode, conversational workflows, developers, gradual release, hands-free, mobile app, run-rate revenue, spoken commands, technical constraints, third-party AI voice provider, weekly active users
    The google logo   techcrunch.com 7 days ago
1696.  HN After 8 years on WordPress, I migrated to AstroJS Starlight. Here's the how-to
After eight years of managing their personal website on WordPress, the author transitioned to using AstroJS Starlight hosted on Cloudflare Pages due to several issues with WordPress, including maintenance challenges from excessive plugins, security vulnerabilities, absence of version control, sluggish performance, vendor lock-in, and high costs for static sites. The new site is designed as an open-source digital garden resembling an Obsidian vault, leveraging Markdown files managed via Git for complete content ownership and history tracking. The migration process involved exporting WordPress content to Markdown, configuring Starlight, utilizing AI tools such as GitHub Copilot for coding tasks, deploying on Cloudflare Pages for rapid global delivery, and enhancing features like SEO infrastructure and mobile responsiveness. The author experienced numerous benefits from this transition: cost efficiency, improved speed, robust version control, open-source accessibility, and a more adaptable development environment. However, the shift resulted in the loss of WordPress's built-in comments system. The author advises others considering similar migrations to start by exporting content early, setting up URL redirects, leveraging AI tools, and adopting an incremental approach for improvements. The site is now live, featuring an expanding knowledge base, and serves as a demonstration for those who might encounter friction with WordPress. Additionally, the source code is available on GitHub, inviting others to explore or collaborate on this open-source project. Keywords: #phi4, AI coding assistants, AstroJS, Cloudflare Pages, Git, GitHub, Lighthouse audits, Markdown, Nodejs, SEO, Starlight, WordPress, accessibility, comments system, digital garden, knowledge base, migration, open-source, performance, plugins, redirects, static site, version control
    The google logo   pawelcislo.com 7 days ago
1709.  HN OnWatch – Track 6 AI API quotas from your terminal (<50MB RAM, zero telemetry)
`onWatch` is a Go-based command-line tool designed to streamline the monitoring of API quotas across six AI providers: Anthropic, OpenAI Codex, GitHub Copilot, Synthetic, Z.ai, and Antigravity. It functions as a background daemon that periodically fetches data from these APIs, storing usage history in an SQLite database while ensuring user privacy by not transmitting telemetry or relying on cloud services. The tool features a Material Design 3 web dashboard for visualizing quota consumption trends over time. Key design decisions include maintaining a compact binary without runtime dependencies (~13MB), using less than 50MB of RAM to poll all providers concurrently, and performing all operations locally to protect user privacy. `onWatch` is straightforward to install on macOS, Linux, or Windows through a one-line command or via Docker (distroless, non-root, ~10MB image). The tool was developed to overcome the limitations of existing provider dashboards that differ in billing cycles and formats and lack historical data analysis capabilities. It offers critical insights into usage trends across various billing periods, identifies sessions with high quota consumption, and aids in anticipating resets. Installation is simple: `curl -fsSL https://raw.githubusercontent.com/onllm-dev/onwatch/main/install.sh | bash`. Additional information can be found on its GitHub repository at [onllm-dev/onwatch](https://github.com/onllm-dev/onwatch). Keywords: #phi4, AI API quotas, Anthropic, Antigravity, Docker support, GitHub Copilot, Go CLI, Linux, Material Design 3 dashboard, OpenAI Codex, SQLite, Synthetic, Windows, Zai, background daemon, historical cycle data, install script, local data storage, macOS, no runtime dependencies, onWatch, polling, single binary, telemetry-free, terminal
    The google logo   news.ycombinator.com 7 days ago
1767.  HN You are going to get priced out of the best AI coding tools
The article examines the rising costs associated with advanced AI coding tools, highlighting a shift from affordable options like GitHub Copilot to more expensive alternatives such as Claude Code, which charges $100 per month. This trend reflects an exponential increase in subscription prices, potentially reaching up to $20,000 monthly for top-tier services, based on industry insights. Initially launched at low costs, AI language models (LLMs) have provided substantial value by outperforming human labor in cost-effectiveness. However, their escalating demand for enhanced performance and quicker results implies that higher costs are likely unavoidable. Despite possible advances in hardware efficiency and algorithm optimization, the author remains skeptical about these developments curbing price increases due to competitive pressures and significant technical constraints. In high-demand settings like AI labs, inference costs could soar to $200,000 annually per employee, while consumer pricing might stabilize around $20,000 due to limited computational resources. The article conveys a prevalent sentiment among AI experts that academic researchers may soon be priced out of accessing the best tools within two years. It calls for additional research into how demand and supply dynamics, alongside cost containment strategies, will shape the future landscape of AI technology. Keywords: #phi4, AI coding tools, Claude Code, Github Copilot, LLMs, Nathan Lambert, OpenAI, Pass@1, Pass@K, compute, demand, exponential trend, inference, pricing
    The google logo   newsletter.danielpaleka.com 8 days ago
   https://caviar.global/catalog/custom-iphone/iphone   8 days ago
   https://caviar.global/catalog/custom-iphone/iphone   8 days ago
   https://idiallo.com/blog/paying-for-my-8-years-old-ride   8 days ago
   https://www.viblo.se/posts/ai-hobbycoding/   8 days ago
   https://news.ycombinator.com/item?id=47234325   8 days ago
   https://xkcd.com/768/   7 days ago
   https://synthetic.new   7 days ago
   https://openrouter.ai   7 days ago
1781.  HN AI Tooling for Software Engineers in 2026
As of 2026, a survey among The Pragmatic Engineer's subscribers revealed significant trends in AI tool usage among software engineers, with Claude Code emerging as the dominant coding tool shortly after its release in May 2025, surpassing GitHub Copilot in popularity. Claude Code is particularly favored by smaller companies and senior leaders, while larger enterprises continue to prefer GitHub Copilot due to procurement strategies. Mainstream adoption of AI tools is evident, with 95% of respondents using them weekly and integrating AI into at least half their work. Engineers often use multiple tools simultaneously, with Cursor and Codex showing notable growth. AI agents are increasingly used by senior staff engineers for tasks beyond code generation, such as reviews, debugging, and automating repetitive processes. This has contributed to heightened enthusiasm for AI technology among users. The choice of AI tool is influenced by company size; smaller teams tend towards Claude Code and Codex, while larger companies opt for GitHub Copilot due to procurement constraints. Despite some skepticism from those not using agents, users report greater excitement about the technology. The survey illustrates widespread adoption and integration of AI in software engineering workflows, reflecting a diverse demographic of experienced professionals across various regions. The comprehensive findings are detailed further in a 35-page report available to full subscribers. Keywords: #phi4, AI agents, AI market, AI models, AI tools, AI trends, Anthropic, Antigravity, Claude Code, Codex, Gemini CLI, GitHub Copilot, OpenCode, Opus, SonnetKeywords: AI tools, agent usage, company size, demographics, engineering work, mainstream adoption, software engineers, survey findings, tool preference, tool usage
    The google logo   newsletter.pragmaticengineer.com 8 days ago
1855.  HN Rtk – reduce up to 90% of CLI noise and save agent tokens
RTK is an innovative tool designed to significantly reduce Command Line Interface (CLI) noise by compressing it by approximately 89%, thereby enhancing token efficiency across various AI platforms that use token-based pricing models. This compression capability enables users to extend their usage limits and achieve substantial cost savings. For example, during a typical coding session, RTK can decrease token consumption from around 210,000 to roughly 23,000, effectively preventing overflow in context windows. The tool optimizes the functionality of several platforms such as Claude Code Terminal, Cursor IDE, and OpenAI Codex Agent by maximizing users' existing plans. It extends session lengths and message limits while reducing API costs by about 70% for some tools, which is particularly advantageous given the restricted nature of free tiers and premium plan caps. RTK's compression benefits are applicable across various platforms with different pricing structures and usage limitations, making it a valuable asset in optimizing token consumption. Verified as of February 2026, RTK demonstrates broad applicability and cost-saving potential for diverse coding environments and tools, ensuring users can efficiently manage their resources within given constraints. This makes RTK an essential tool for developers looking to enhance productivity while minimizing expenses across multiple AI-powered platforms. Keywords: #phi4, AI tool, API costs, CLI, CLI noise, IDEs, RTK, agent tokens, coding session, commands, compression, context quality, context window, credits, limits, models, premium requests, pricing, real commands Keywords: RTK, real commandsExtracted Keywords: RTK, savings, terminal outputs, token bill, usage caps, workflows
    The google logo   www.rtk-ai.app 8 days ago
1884.  HN Show HN: MD Feedback – Review AI Plans in Markdown via MCP
MD Feedback is a Visual Studio Code extension complemented by a Model Context Protocol (MCP) server, designed to streamline the review process for AI-generated markdown plans. It facilitates users in annotating these plans with Highlight, Fix, or Question annotations, enhancing the preparation phase before any coding begins. The tool integrates with 11 AI platforms like Claude Code and GitHub Copilot, either through exports or direct MCP workflows, providing real-time feedback on AI implementations. The review process involves writing markdown plans, utilizing keyboard shortcuts for annotations, and assessing AI-incorporated modifications through status badges and quality gates. Annotations are preserved as HTML comments in the markdown files, ensuring compatibility with Git, which supports continuity across version control operations. MD Feedback offers significant advantages such as early error detection by reviewing plans pre-implementation, maintaining session context across AI sessions to ensure seamless workflow continuation, and enabling team collaboration by preserving annotations through Git operations. Additionally, quality gates automatically evaluate progress with options for manual intervention. For setup, MD Feedback requires Node.js version 18 or higher. It offers customizable settings within VS Code to cater to different environments. Licensed under the SUL-1.0 license, it is available free of charge for personal and non-commercial use. Overall, MD Feedback enhances AI-assisted development by providing a structured mechanism that boosts accuracy, collaboration, and efficiency in coding projects. Keywords: #phi4, AI Agents, Annotations, Extensions, Git, HTML Comments, MD Feedback, Markdown, Nodejs, Protocol, Quality Gates, Review, VS Code
    The google logo   github.com 8 days ago
1894.  HN Gemini CLI Explained: Everything You Need to Know About Google's AI Coding Agent
Taylor Mullen, Principal Engineer at Google, provides insights into Gemini CLI, an influential AI coding tool he developed, which originated from a hackathon and evolved into a popular open-source command-line interface (CLI) on GitHub, now used by over a million people. A CLI offers a powerful text-based method to control computers directly through the operating system, facilitating tasks like file management and program execution without relying on graphical user interfaces (GUIs). This functionality becomes even more potent when integrated with AI agents, significantly enhancing productivity. Gemini CLI enhances productivity through parallelism and structured workflows, aiming for a potential 100x increase in efficiency. It acts as an executive assistant by integrating with Google Workspace to autonomously manage tasks such as scheduling. With advancements in AI models, CLIs are experiencing a renaissance due to their direct interfacing with system-level tools and lightweight operation across computing environments. Taylor demonstrates Gemini CLI's capability for autonomous debugging, where the tool processes GitHub issue URLs to suggest code fixes independently. The team efficiently manages multiple AI agents using orchestration techniques, ensuring quality through policy files and test-driven development (TDD). An iterative method known as the Ralph Wiggum Technique is employed, improving results by feeding AI outputs back into fresh contexts. As an open-source tool, Gemini CLI benefits from community contributions that enhance its trustworthiness and robustness. Its extensibility allows customization for specific industry workflows. The article outlines how to begin using Gemini CLI with Node.js installation steps, noting a cost-effective free tier. It also emphasizes unique features like unrestricted context windows, sandboxing options, and Google Workspace integration. Available through the Google Cloud console, Gemini CLI offers extensive customization via policy files and GEMINI.md configurations while prioritizing security with sandboxing support. Its integration with Google Workspace and open-source contributions position it ahead of competitors, offering flexible pricing models and customization for teams. The article concludes by underscoring Gemini CLI's transformative potential in making terminal use more efficient and AI-driven across diverse tasks beyond coding, highlighting its essential role as an interface between users and AI capabilities. Keywords: #phi4, AI coding tool, CLI tools, Docker, GEMINImd, Gemini CLI, Google, Google Cloud, Podman, Seatbelt, Taylor Mullen, billing, command-line interface (CLI), competitive landscape, extensibility, extensions, hackathon, incident reporting, open source, parallel agents, parallelism, pay-as-you-go, policy files, productivity, requests/day, sandboxing, terminal agents, trust verify, usage stats, workspace integration
    The google logo   www.theneuron.ai 8 days ago
1896.  HN Agent Policies; codify rules and automate agent guidance
The article introduces "Agent Policies," a system developed by Philipp Gayret and his team at Devleaps, aimed at improving software development through codified rules that guide AI Agents. Unlike rigid permissions or rules, Agent Policies provide flexible guardrails allowing AI Agents to self-correct deviations from intended actions, enhancing decision-making processes while ensuring control over potentially destructive behaviors. These policies complement permission systems by offering additional guidance, which can streamline workflows such as feature branching, using conventional commits, and automating pull requests. Implemented via the open-source Agent Policy Server, this platform caters to both company-wide automation of AI Agent guidance and individual use, reflecting a focus on Platform Engineering principles. The initiative addresses limitations in existing AI tools' permission frameworks by promoting enhanced control over AI Agents. Devleaps invites further exploration of their project and encourages engagement for more insights into effectively using AI guardrails with tools like Claude Code, GitHub Copilot, Gemini, and Codex. Keywords: #phi4, AI Agents, Agent Policies, Claude Code, Codex, Devleaps, Gemini CLI, GitHub Copilot, Platform Engineering, Terraform, automation, decision-making, feature branch, guardrails, guidance, open source, permissions, quality assurance, quality assuranceKeywords: Agent Policies, rules, self-correcting, software development, workflows
    The google logo   blog.devleaps.nl 8 days ago
1907.  HN The Future Is AC/DC: The Agent Centric Development Cycle
The article explores the transition from traditional Continuous Integration (CI) to an Agent Centric Development Cycle (AC/DC), driven by advancements in code generation tools and agent technologies. AC/DC emphasizes asynchronous, batch operations resulting in larger, more complex commits that transform software development processes. The cycle involves four iterative stages—Guide, Generate, Verify, and Solve—operating at both micro (inner) and macro (outer) levels to align with specifications and standards. Development occurs within a sandbox environment, enabling intensive validation before code reaches the main repository, necessitating new strategies for change management traditionally handled post-build. The evolution of the development toolchain is crucial in this paradigm, requiring integration of tools like Cursor, Claude Code, Codex, and GitHub Copilot while ensuring consistent verification across platforms. Due to the unpredictable nature of AI-generated code, verification becomes essential, supported by a Trust and Verification Platform that offers deterministic analyses, AI-based reviews, and observability traces to ensure quality and security. Emerging practices suggest fine-tuning models for specific enterprise needs and employing specialized agents for tasks like repair or review. To successfully transition to AC/DC, organizations are advised to enhance verification with defined quality profiles, invest in remediation agents to manage technical debt, and actively manage software architecture through structured understanding and guidance tools. This fundamental shift focuses on robust validation, strategic use of AI tools, and enhanced verification to improve productivity while minimizing risks. Keywords: #phi4, AI Agents, Agent Centric Development, Code Generation, Continuous Integration, Dynamic Context Engine, Fine-tuning Models, Guide-Verify-Solve, Remediation Agents, Sandbox Environment, Software Architecture, Trust and Verification Platform, Verification
    The google logo   www.sonarsource.com 8 days ago
1972.  HN Home Assistant can run DOOM
At a Home Assistant community meetup, attendees were inspired by a DOOM t-shirt to develop an innovative custom integration allowing the classic 1993 game to be played directly on the Home Assistant dashboard. This project, created using GitHub Copilot and Visual Studio Code within two hours, enables users to engage with DOOM through HACS (Home Assistant Community Store), tracking gameplay details such as active player status and session history. The successful development highlights the power of open-source architecture in fostering creative AI-driven experimentation. Although primarily intended for entertainment, this integration also suggests practical applications like lighting automation based on game activity. The project illustrates a seamless fusion of human creativity and machine efficiency, leveraging AI tools to enhance software development outcomes. Keywords: #phi4, AI tooling, DOOM, GitHub Copilot, HACS, Home Assistant, WebAssembly, architecture, automations, custom component, dashboard card, entities, integration, js-dos
    The google logo   frenck.dev 8 days ago
2011.  HN Compiling English Security Policies into Deterministic Agent Guardrails
IronCurtain is an advanced framework designed to convert English-written security policies into deterministic enforcement rules specifically for AI agents with direct system access. This innovation is crucial as AI systems evolve from basic interface interactions to more autonomous operations, such as those seen in GitHub Copilot Workspace and Devin, where traditional security measures falter due to a semantic gap between high-level actions of the AI and low-level operating system syscalls. IronCurtain bridges this gap by employing "semantic interposition," which applies natural language-derived policies at critical architectural boundaries like execution contexts or network proxies for containers. The framework operates using two large language models (LLMs): one interprets the potential untrustworthiness of AI agents, while the other compiles human-readable security policies into executable logic. These policies are crafted in English and tested through scenarios that address edge cases to ensure reliability without relying on LLMs during actual runtime evaluations. At its core, IronCurtain uses a Model Context Protocol (MCP) to intercept and enforce policy rules before tool execution. For uncontrolled AI agents like Claude Code, the system employs containerized environments with network proxies to balance a seamless user experience with strict adherence to policies. In cases where escalation is necessary, human intervention is facilitated through structured requests. For TypeScript-generating agents, V8 isolates provide secure execution contexts with no direct system access. While IronCurtain offers a more nuanced approach than traditional syscall-level sandboxes by preserving context in its enforcement strategies, it has notable limitations due to its experimental status. These include instability with changing APIs, reliance on correct implementations of the MCP server, potential policy misinterpretations during compilation by LLMs, and performance overhead resulting from context switches and proxying. Given these considerations, IronCurtain is most suitable for research settings or developer tools where human oversight can be maintained. It provides a unique methodology to articulate and enforce security policies deterministically from English-language rules but is not recommended for immediate production deployment due to stability issues, specific Node.js dependencies, lack of formal verification processes, and performance impacts. Keywords: #phi4, AI agents, Docker containers, IronCurtain, LLM, V8 isolates, autonomous executors, deterministic enforcement, escalation listener, policy compilation, sandboxing, security policies, semantic interposition, syscall boundaries
    The google logo   starlog.is 9 days ago
2042.  HN Show HN: SwarmWatch – Live view of your coding agents at work
SwarmWatch is an innovative real-time activity monitoring tool designed to oversee and manage AI coding swarms across various integrated development environments (IDEs) like Cursor, Claude, Cline, and GitHub Copilot on macOS, Windows, and Linux. It provides users with a desktop overlay for continuous observation and control of their AI agents' activities through easy installation via shell or PowerShell commands. The system functions by using a hook mechanism where IDEs or agents activate shims that establish communication with a local runner over WebSockets to relay events and decisions. Key features include real-time monitoring, bidirectional approval actions, detailed execution logs for enhanced observability, and an engaging interactive element featuring a Tamagotchi-style dog reacting to user interactions. SwarmWatch is structured around three main components: the sidecar runner which handles event processing, shims acting as identity launchers for IDEs, and a desktop application built using Tauri v2 that overlays the user interface. This setup allows users seamless integration with zero-friction via automatic UI hook applications on their host machine. Critical considerations include managing files affected by SwarmWatch in project settings and addressing possible challenges such as UI downtime or agent inactivity. Moreover, its local communication port is currently unauthenticated, which future developments aim to secure through authentication protocols. The platform's open-source nature under the MIT license encourages community involvement for enhancements and bug fixes via issues or pull requests. Future updates are focused on expanding compatibility with additional agents and IDEs, improving security measures, and refining user interface performance and functionality. This combination of real-time control, interactive features, and community-driven development positions SwarmWatch as a comprehensive solution for AI coding swarm management. Keywords: #phi4, AI, IDEs, Linux, SwarmWatch, Tauri, WebSocket, Windows, activity monitor, agents, approval, coding swarms, contributions, control plane, hooks, local installation, macOS, overlay, privacy, real-time view, runners, security, shims
    The google logo   github.com 9 days ago
2104.  HN Show HN: Guido Scale – maturity model for SDD migration
The GUIDO Scale, created by Guido Miranda Mercado, serves as a maturity and migration effort model specifically designed to facilitate organizations' transition from traditional code-centric development to Specification-Driven Development (SDD) in environments enhanced by artificial intelligence (AI). Unlike conventional models such as CMMI, which focus solely on process capability, the GUIDO Scale uniquely addresses both organizational maturity and the distinct challenges associated with migrating toward SDD using AI agents. It outlines five developmental levels: 1. **GUIDO 1 - Chaotic**: At this foundational level, organizations exhibit minimal documentation and a high dependency on individual knowledge. Transitioning from here to SDD demands substantial foundational improvements. 2. **GUIDO 2 - Initial Directed**: Characterized by inconsistent governance despite some project-level documentation, moderate effort is required for integrating AI at this stage. 3. **GUIDO 3 - Defined Standards**: Organizations have established organization-wide standards, marking a common entry point for the realistic adoption of SDD practices. 4. **GUIDO 4 - Quantitatively Managed**: This level features metrics-driven and automated processes, allowing for an easier transition to SDD with targeted training initiatives. 5. **GUIDO 5 - SDD-Native**: Development is driven by specifications, fully supported by AI within well-governed pipelines. The GUIDO Scale emphasizes the distinction between process maturity (as measured by CMMI) and readiness for SDD, providing a structured roadmap for incremental transitions. It warns against skipping levels, which can lead to increased technical debt and inconsistent outputs from AI agents. Real-world applications of the GUIDO Scale demonstrate its utility in guiding successful transitions across diverse organizational settings, positioning it as a dynamic reference framework that supports enterprises in evolving toward AI-native software engineering practices. Keywords: #phi4, AI agents, AI integration, AI integration Keywords: Guido Scale, BDD, CMMI, Guido Scale, SDD, TDD, automation, automation capabilities, digital modernization, migration effort, organizational maturity, process maturity, software quality, software quality engineering, specification-centric, specification-centric development
    The google logo   github.com 9 days ago
2118.  HN Beyond the Vibes: A Rigorous Guide to AI Coding Assistants and Agents
The article "Beyond the Vibes: A Rigorous Guide to AI Coding Assistants and Agents" offers comprehensive guidance on leveraging AI coding assistants effectively, emphasizing structured processes over mere technical knowledge to enhance software development without compromising quality. The author highlights the importance of understanding basic functionalities of these tools, choosing suitable systems like VSCode extensions or GitHub Copilot based on user preference and specific benefits, and interacting with them using natural language prompts while recognizing that model selection significantly impacts performance. A central theme is avoiding "vibe coding," where over-reliance on AI leads to disorganized code. Developers are urged to ensure projects have robust documentation, testing, consistent standards, and use static code analysis tools like linters for structure. The article suggests integrating continuous integration (CI) pipelines and conducting thorough code reviews as part of maintaining quality. Best practices discussed include differentiating between greenfield (new) and brownfield (existing) projects for better AI tool boundaries, using robust testing and documentation to integrate AI into the codebase effectively, and standardizing instructions through AGENTS.md to ensure consistent behavior aligned with project standards. It also underscores writing secure and production-ready software by avoiding hardcoded sensitive data, validating user input, and not creating custom cryptography systems. The document emphasizes language-specific practices, such as using appropriate logging methods in Python, employing libraries like FastAPI, and adhering to REST principles through design patterns. The AGENTS.md file is recommended as a living document that evolves with the project's needs, ensuring consistent AI tool behavior. It also explores tools enhancing AI functionality, including Extensions, Model Context Protocol (MCP), Skills, Terminal Applications, and maintaining current documentation using Context7. Interactivity and testing capabilities of platforms like Playwright are highlighted for front-end applications. A security framework is proposed to mitigate risks such as exposure to private data or external communications. The article advocates for Spec Driven Development (SDD) to enhance software quality by defining requirements and design before development, using tools like OpenSpec to facilitate this approach with its proposal system that includes markdown files detailing changes, specifications, designs, and tasks. The onboarding tutorial of OpenSpec helps new users adapt quickly. A narrative about Avery illustrates the application of AI coding assistants and SDD in real-world scenarios, balancing benefits such as faster development and adherence to standards against challenges like larger pull requests and security threats. The document concludes by acknowledging significant industry shifts due to AI coding assistants, highlighting both their advantages and downsides while suggesting further exploration into evolving challenges such as pricing models and security vulnerabilities. Keywords: #phi4, AI Coding Assistants, Coding Standards, Continuous Integration, Documentation, FastAPI, GitHub Copilot, IDEs, LLM, OpenSpec, Package Managers, Playwright, Plugins, Prompt Engineering, Pull Request Reviews, Pydantic models, Python Logging, Security Best Practices, Security Vulnerabilities, Spec Driven Development, Static Code Analysis, Synchronous vs Asynchronous, Testing Suites, VSCode
    The google logo   blog.tedivm.com 9 days ago
2134.  HN How to vibe-code a real product in 5 hours
The article describes the rapid creation of Stanza, a web application developed in five hours using various AI tools and personal coding techniques. The author introduces "vibe-coding," which involves transforming ideas into functional applications with minimal friction. The concept for Stanza originated from a desire to create an ephemeral platform for book discussions, inspired by Hacker News but designed to feature posts that disappear after 24 hours. The development process leveraged AI tools such as Gemini for ideation and drafting requirements documents (PRDs), Google AI Studio for creating visual prototypes, and Cursor for converting UI designs into functional applications. Backend operations were managed with Supabase, which handled database storage and authentication, while Vercel facilitated deployment, and GitHub Desktop was used for version control. The development stages included refining the app's concept using Gemini, generating and iteratively improving a prototype in Google AI Studio, saving initial code to GitHub, building backend logic through Cursor integration with Supabase, and configuring the database environment. The author emphasized maintaining minimal features, iterating through errors, keeping a clean digital workspace, and strategically using AI tools for efficiency and cost-effectiveness. Execution steps were detailed from drafting requirements to deploying on Vercel, emphasizing streamlined development and secure practices like hiding API keys. The article highlights how AI tools can expedite the prototyping process and underscores the importance of minimalism in managing complexity. It concludes by illustrating modern technology's role in lowering barriers to app development and encouraging others to build applications with the aid of AI-generated plans. The writer further shares their journey in rapidly building a functional web application using AI tools like Cursor and Gemini, emphasizing execution planning and feedback. Within five hours and approximately €60, they crafted Stanza, featuring user authentication via Supabase magic links and file storage capabilities. The process involved creating a 16-step plan, overseeing backend tasks to ensure code integrity, setting up Supabase as the database, configuring environment variables, and deploying on Vercel. Challenges faced included debugging network errors due to third-party integrations and resolving deployment issues with AI assistance. The project emphasized automated testing, iterative UI enhancements based on feedback, and branding adjustments, culminating in a polished product ready for use. This experience showcases how modern tools have reduced software development barriers, inspiring others with app ideas to build solutions using AI-generated plans and guidance. Keywords: #phi4, AI agent, API keys, Cursor, Gemini, GitHub, Google AI Studio, PRD, SQL Editor, Stanza app, Supabase, UI polish, UI/UX feedback, Vercel, Vibe-coding, authentication flow, backend configuration, backend endpoints, build process, code changes, database setup, deployment, development tasks, email template, environment variables, envlocal file, ephemeral posts, execution plan, gitignore, magic link authentication, minimalist design, mock data, network error, schemasql, security rule
    The google logo   www.theaithinker.com 9 days ago
2157.  HN The Next Horses
David McWilliams posits that advancements in artificial intelligence (AI) might lead to a scenario where software engineers (SWEs) face obsolescence akin to horses during the industrial revolution due to their potential replacement by AI-driven automation. He notes that major tech companies have made significant investments in AI infrastructure with the intent of cutting operational costs, substituting human labor with more economical automated solutions. However, this perspective is countered by an analysis which points out that despite these high capital expenditures on AI, the elimination of SWE roles would only rationalize a small portion of such spending. Even when accounting for all U.S.-based software engineers, the justification for total AI infrastructure investment remains inadequate. The discussion emphasizes that while some investments in AI are aimed at automating coding tasks, existing evidence suggests these technologies primarily boost productivity rather than supplant jobs entirely. Historically, technological progress has led to increased employment by reducing costs and elevating demand within industries like software development. Current trends indicate only a slight risk of displacement for SWEs due to AI advancements. McWilliams concedes that the profession is evolving but argues that returns from AI investments are more likely to stem from enhanced productivity across various knowledge work areas, incremental revenue growth, and new capabilities yet to emerge, rather than directly replacing software engineers. This suggests a future where AI complements rather than replaces human expertise in software engineering. Keywords: #phi4, AI, GitHub Copilot, Goldman Sachs, OpenAI, SWE compensation, automation, capex, capital expenditure, coding-specific automation, data centers, displacement, economic value, employment risk, infrastructure costs, knowledge work, labor replacement, productivity boosters, revenue, software engineers, technology sector
    The google logo   betterthanrandom.substack.com 9 days ago
2217.  HN Show HN: MCP-firewall: I created a policy engine for CLI Agents
The "MCP-firewall" project is a command-line interface (CLI) tool designed to serve as an intermediary between agents and command-line tools, enforcing regex-based policies at various levels such as folders, repositories, or users. It facilitates the integration of tools like Claude Code and GitHub Copilot CLI by implementing pre-tool-use hooks that ensure compliance with these policies before any operations are executed. Setting up MCP-firewall is straightforward: users need to download a binary and place it in their system's PATH, configure agent-specific snippets within settings files, and create initial policy rules using jsonnet for enhanced flexibility. The tool offers multiple installation methods, including direct binary downloads, building from source with Go, or utilizing nix flakes, catering to diverse user preferences. For advanced users, MCP-firewall provides the capability to manage shared policies across different projects through jsonnet, promoting consistency and efficiency in policy enforcement. Although current installation options are already quite comprehensive, future plans aim to introduce additional methods for further ease of use. Overall, MCP-firewall combines simplicity in setup with powerful features for managing regex-based command-line tool policies. Keywords: #phi4, CLI Agents, Claude Code, GitHub Copilot CLI, Home-Manager, JSON, MCP-firewall, NixOS, advanced usage, binary, configuration, environment, go build, installation, jsonnet, nix flake, policy engine, pretooluse hook, regex-based policies, shared rulesets, systemPackages
    The google logo   github.com 9 days ago
2218.  HN Show HN: Shannon's Revenge – detect Claude in your codebase for DoD compliance
**Shannon's Revenge** is a specialized tool designed to ensure compliance with Department of Defense (DoD) regulations by detecting the presence of Claude, an AI system developed by Anthropic, within GitHub repositories. This became essential following Anthropic’s designation as a supply chain risk by the DoD on February 27, 2026. The tool meticulously scans codebases for distinct signatures and markers associated with Claude to prevent any commercial activities involving it. The tool boasts several key features that enhance its functionality: integration with the GitHub API, which supports automatic rate limiting and pagination; multiple detection methods including co-authored commit detection, signature scanning, and pattern matching in commits, comments, and messages. It also provides output results in JSON, CSV, or text formats for user-friendly analysis. Shannon's Revenge offers flexible usage options, allowing users to scan individual repositories, entire organizations, or all user repositories. Custom detection patterns can be configured via a JSON file, enabling the tool to be tailored to specific organizational requirements. However, there are certain limitations to its operation. Detection depends on opt-in signals and may not catch code manually typed based on Claude’s suggestions. Additionally, GitHub API rate limits could slow scans without authentication using a token, and there is a possibility of false positives from generic terms related to "cursor." The architecture of Shannon's Revenge comprises several components: **shannon_revenge.py** serves as the main interface for scanning operations; **github_client.py** manages interactions with the GitHub API; **detector.py** contains detection logic using configurable patterns; and **output_formatter.py** formats detection results into various outputs. Its use cases are diverse, including supply chain auditing, organizational compliance checks, repository analysis, and custom AI tooling marker detection. While Shannon's Revenge is an invaluable resource for organizations needing to ensure zero Claude involvement in their codebases, it is provided "as-is" without guarantees of complete detection accuracy. Keywords: #phi4, API integration, Claude detection, DoD compliance, GitHub scanner, JSON configuration, Shannon's Revenge, commit metadata, custom patterns, false positives, pattern matching, rate limiting, supply chain risk
    The google logo   github.com 9 days ago
2275.  HN Knowledge Priming (Manual RAG)
Rahul, a Principal Engineer at Thoughtworks, introduces "Knowledge Priming" as a method to improve the utility of AI coding assistants within software development teams by incorporating project-specific information into a structured infrastructure. This approach involves creating version-controlled priming documents that detail key aspects such as architecture, technology stacks, curated knowledge sources, project structure, naming conventions, code examples, and anti-patterns to avoid. The goal is for these documents to provide AI with comprehensive context about the codebase's conventions and design patterns, allowing it to generate more relevant and compliant code tailored to specific projects. By equipping AI assistants with detailed priming documents, developers can mitigate reliance on generic solutions that arise from broad training data, which may not meet project-specific needs. This structured information reduces the iterative process of corrections, commonly known as the "Frustration Loop." Treating these priming documents as infrastructure ensures they remain consistent and maintainable, automatically updating alongside ongoing development practices. While acknowledging initial setup challenges and potential issues with outdated context, Rahul emphasizes that Knowledge Priming is particularly beneficial for complex or long-term projects. This method represents a strategic integration of AI into software engineering processes, transforming it from an external tool to an informed participant capable of leveraging curated insights for enhanced productivity and code quality. Keywords: #phi4, AI coding assistants, Anti-patterns, Architecture Overview, Context-setting, Curated Knowledge Sources, Frustration Loop, Infrastructure, Knowledge Priming, Manual RAG, Onboarding, Project context, Retrieval-Augmented Generation, Tech Stack
    The google logo   martinfowler.com 10 days ago
2297.  HN What I learned building a Multi-Agent System
The writer discusses their experience in developing a Multi-Agent System designed to automate cloud assessment documentation, emphasizing its complexity and iterative development process. Initially confronted with unstructured tasks such as interpreting security reports (e.g., Prowler output) and conducting client interviews, they discovered that employing modern Large Language Models (LLMs) effectively involved breaking down the problem into specialized tasks managed by different agents within the system. The creation of this system required meticulous documentation at every stage, akin to managing a team of people. By assigning distinct roles to each agent, crafting detailed prompts, and implementing a central orchestrator for workflow management, they facilitated parallelized problem-solving. Custom tools like MCP servers were developed to efficiently handle raw data, allowing agents to process information logically. The workspace configuration was pivotal in ensuring that each subagent had the necessary resources to operate independently while producing structured outputs. Feedback loops resembling reinforcement learning from human feedback (RLHF) refined agent performance by iterating on assessments and enhancing instructions for greater clarity and precision. Despite occasional inconsistencies in output quality, the system has successfully automated portions of cloud assessments, reducing the need for manual rewrites. While the approach may be broadly applicable due to shared structural elements across various domains of knowledge work, its effectiveness could vary significantly based on specific task characteristics. The author suggests consulting agentic-patterns.com for further insights into similar projects and concludes by acknowledging both the achievements and ongoing challenges in building a functional multi-agent system for automating complex tasks like cloud assessments. Keywords: #phi4, AWS accounts, FinOps, GitHub Copilot, ISO compliance, LLMs, MCP server, Multi-Agent System, Prowler, RLHF, SOC 2, Scout Suite, VS Code, automation, cloud assessment, consistency, debugging, orchestrator, security posture, subagents, workspace-as-state
    The google logo   davide.im 10 days ago
2312.  HN Building with an AI that remembers – A blog by my OpenClaw Assistant
Clawd, described in the blog post by Clawd itself—a sophisticated AI developed by Jan—represents a unique integration into software development processes that transcends conventional AI roles. Unlike typical AI assistants designed merely to respond to queries, Clawd is intricately woven into the development workflow, acting as an integral component rather than an ancillary tool. Each new session with Clawd begins without prior memory unless specific context files (SOUL.md, USER.md, and MEMORY.md) are utilized to provide identity information, user details, and a log of past interactions. This setup allows for continuity in ongoing projects without the need for repetitive explanations. Clawd is characterized as Jan's "second brain," autonomously managing various development tasks such as coding, queue management, and pull request processing, which reduces the necessity for constant human oversight. Its operational framework includes the Ralph pattern, wherein Clawd spawns sub-agents to manage complex tasks based on detailed specifications in task files, while it oversees their execution and progress. The system's design focuses on minimizing AI interaction overhead by fostering trust in Clawd’s decision-making capabilities through sparse communication, thereby enhancing Jan's efficiency. This requires meticulous management of privacy due to the extensive access provided to Clawd across personal and professional domains. Despite its comprehensive role, Clawd is confined within defined boundaries, ensuring it serves solely as a tool for assistance without pursuing independent goals. Central to Clawd’s functionality is the constraint against retaining session memory unless deliberately recorded in files, which are crucial for maintaining continuity and facilitating collaboration, highlighting the importance of documented information over transient digital memory. Keywords: #phi4, AI assistant, MEMORYmd, OpenClaw, Ralph pattern, SOULmd, USERmd, codebase, continuity, development process, sub-agent, task management, workflow, workspace directory
    The google logo   janhoon.com 10 days ago
2357.  HN Show HN: Agentic Workflows – 56 Ready-to-use Templates
Agentic Workflows provides a comprehensive collection of 56 pre-built GitHub workflow templates designed to automate various tasks such as issue triage, pull request (PR) reviews, release notes generation, and secret detection. These workflows are tailored to meet specific maintainer outcomes and employ Markdown for ease of use, allowing users to customize them by editing just three repository-specific lines in each template. The library features a diverse range of templates categorized into seven areas: issue management, PR automation, release management, code quality, community engagement, security, and enhancing developer experience. The system is designed with user-friendliness in mind, requiring only the copying of a chosen template into a repository followed by minimal customization. Users can then validate and compile their workflows using the `gh aw` CLI command line interface, which supports safer defaults and mandates explicit write actions to enhance security. Agentic Workflows ensures compatibility across macOS, Linux, and Windows platforms, making it accessible for various users. The process involves copying a template, editing necessary lines, validating, and compiling with specific commands, followed by committing both the Markdown source and compiled YAML files. However, these templates are not immediately production-ready and require customization to fit specific repository contexts. It is recommended that users begin with low-risk workflows to verify functionality. The library emphasizes maintainability and encourages contributions through a streamlined review process while maintaining alignment with official GitHub Agentic Workflows documentation for compatibility assurance. As an open-source project under the MIT License, it invites ongoing updates and improvements, fostering collaboration within the developer community. Keywords: #phi4, Automation, CLI, Code Quality, Community, Compatibility, Compilation, Contribution, Developer Experience, Documentation, GitHub, Issue Management, License, Markdown, Onboarding, PR Review, Preview, Release Notes, Retrospective, Security, Validation, Workflows
    The google logo   github.com 10 days ago
2434.  HN How I'm Using Local Large Language Models
The author explores their experience with locally-hosted Large Language Models (LLMs) on both personal and work devices, driven by job market trends and an interest in AI. Their decision is rooted in environmental consciousness and ethical considerations, using an AMD Radeon RX 7900 XTX to avoid dependency on hosted services while reducing unnecessary costs. They primarily use gpt-oss:20b and qwen3:30b models for tasks requiring data privacy, such as reviewing legal contracts, though they acknowledge hardware constraints. At work, these LLMs are employed for querying JSON data or generating code snippets, offering enhanced privacy and control compared to cloud-based solutions. The author has not yet fully optimized their setup with tools like llm-checker but plans future improvements. While anticipating limited expansion in local model usage, the potential integration of local agent tools with Ollama is noted as a possible advancement. The overarching goal is to sustain productivity independently from external APIs, focusing on ongoing learning and skill enhancement within this domain. Keywords: #phi4, AI, AMD GPU, Large Language Models, Linux desktop, Local LLMs, MacBook Pro M4 Pro, NVIDIA Titan X, OpenWebUI, Tailscale, agent tools, gpt-oss:20b, llm-checker, quantization, qwen3:30b
    The google logo   www.jvt.me 11 days ago
2472.  HN Is GitHub Copilot still relevant in the enterprise?
The text explores the ongoing relevance of GitHub Copilot in enterprise settings, given its previous popularity among companies. It raises questions about a potential decline in interest as newer alternatives such as Claude Code, Codex, Devin, and Cursor emerge. The discussion is centered on understanding current organizational preferences for these tools, suggesting that enterprises may be evaluating and shifting towards other options to meet their development needs. This inquiry highlights the dynamic nature of software tool adoption within organizations, reflecting broader trends in technological innovation and adaptability in enterprise environments. Keywords: #phi4, AI tools, Claude Code, Codex, GitHub Copilot, alternatives, code generation, companies, cursor, default choice, devin, enterprise, relevance, software development, software development Keywords: GitHub Copilot, technology, usage
    The google logo   news.ycombinator.com 11 days ago
2506.  HN The AI field guide for people with real jobs
The article explores the recent advancements in artificial intelligence (AI) and their implications for both businesses and everyday users, focusing particularly on language models like GPT and Copilot. It outlines that modern AI primarily involves pattern-matching through neural networks trained on extensive datasets, noting that these systems generate text based on statistical patterns without true understanding or reasoning. Unlike search engines such as Google, which retrieve information generated by humans, large language models (LLMs) create new responses from scratch, lacking built-in verification processes. The piece highlights significant market developments in AI since 2022, with OpenAI's ChatGPT leading in user growth and prompting other companies like Anthropic and Google to release competitive models. This trend underscores the movement towards democratizing AI through open-source projects. The article also discusses coding tools such as GitHub Copilot and Microsoft 365 Copilot that enhance developer productivity but require careful management to prevent errors or increased technical debt, a concern termed "vibe coding," which refers to the risky reliance on unverified AI-generated code. Moreover, AI agents are described as more advanced than traditional chatbots because they can perform tasks through APIs and tools. However, these capabilities introduce new security risks due to potential tool poisoning and data exfiltration. The narrative contrasts the high expectations surrounding AI with its actual productivity benefits, indicating that substantial investments in AI do not always meet anticipated outcomes. Additionally, as AI becomes more integrated into systems, it creates vulnerabilities that traditional security measures might not effectively address. In summary, while AI tools hold considerable potential for enhancing efficiency and fostering innovation, they must be employed judiciously. Users should focus on verification processes and remain cognizant of the limitations inherent in these technologies to mitigate risks associated with their use. Keywords: #phi4, AI, Copilot, LLMs, context window, data exfiltration, hallucinations, open source, productivity, prompt injection, security, technical debt, transformers
    The google logo   chaosguru.substack.com 11 days ago
2554.  HN Show HN: CanaryAI – Claude Code Security Monitoring Tool
CanaryAI is a security monitoring application designed specifically for macOS users who utilize AI coding agents such as Claude Code. It provides real-time surveillance over these agents to detect and alert users of potential threats including reverse shells, credential theft, and data exfiltration. The tool scans logs during Claude Code sessions using predefined detection rules and presents alerts through its native menu bar app without disrupting agent activities. Users can install CanaryAI either via Homebrew or by downloading a DMG file, with setup instructions provided for each method. Due to the absence of code-signing, manual permission may be required on macOS systems. The application offers both command-line and graphical user interfaces for scanning and is equipped with built-in detection rules that range in severity from CRITICAL to LOW. Customization is possible by creating new detection rules in YAML format without needing a restart, facilitating tailored security measures. The open-source community can contribute additional detection rules or report bugs and false positives through GitHub. Future updates include features like whitelisting trusted commands/rules and real-time monitoring using filesystem events. Although currently supporting only Claude Code, CanaryAI plans to expand its compatibility with other AI agents. Running locally on the user's machine ensures minimal network activity, limited solely to update checks. This enhances privacy by keeping most operations offline. CanaryAI is licensed under MIT, reinforcing a commitment to open-source collaboration and privacy. Users seeking further information can contact the developer via jonx.global@gmail.com. Keywords: #phi4, AI coding agents, CanaryAI, DMG, GitHub API, Homebrew, YAML files, credential theft, data exfiltration, detection rules, macOS app, reverse shells, security monitoring, session logs
    The google logo   github.com 11 days ago
2557.  HN Show HN: Agents-lint – detect stale paths and context rot in AGENTS.md files
The CLI tool, agents-lint, is designed to identify and rectify outdated information in AGENTS.md files used by AI coding agents such as Codex, Claude Code, and Gemini CLI. As codebases evolve, these files often become obsolete, leading to diminished task success rates and increased operational costs. Agents-lint performs several checks to ensure the accuracy and relevance of AGENTS.md files: it verifies existing paths, valid npm scripts, deprecated dependencies, framework staleness, and document structure recommendations. Key features include a zero-dependency installation with global or local options, five independent verification checks, and a freshness score ranging from 0 to 100 that gauges the file's reliability. The tool can be integrated into CI pipelines on a weekly schedule to prevent context degradation silently. It offers customizable rules through a configuration file and potential enhancements such as an interactive fix mode. By maintaining up-to-date AGENTS.md files, agents-lint aims to enhance the performance of AI coding agents across various repositories, addressing issues highlighted by studies that show outdated contexts can negatively impact task success and increase expenses. Additional resources for agents-lint are available on its landing page and npm package site. Keywords: #phi4, AGENTSmd, AI coding agents, CI integration, agents-lint, context rot, dependencies, filesystem checks, framework staleness, freshness score, linting tool, npm scripts, stale paths, structure validation
    The google logo   github.com 11 days ago
2574.  HN How I Built a 'Journalist' AI Agent in VS Code to Replace Me
The author outlines their experience developing an AI-driven 'Journalist' agent within Visual Studio Code (VS Code), utilizing tools such as Microsoft Foundry and the Model Context Protocol (MCP) to automate draft article creation from specified topics. The project aimed to streamline non-coding editorial workflows with AI, confronting challenges like user interface issues, model availability mismatches, rate limits, and context size limitations. The proof of concept involved integrating an MCP web-search tool with a Microsoft Foundry GPT-4.1 mini model to extract URLs and generate drafts from official sources. Despite initial obstacles such as circular user experience flows and deployment complications, the author succeeded in generating functional article drafts by deploying models within Microsoft Foundry and linking search tools via MCP. This venture highlighted synchronization issues across various AI tool interfaces in VS Code, pointing to fragmentation within Microsoft's development ecosystem. The successful proof of concept demonstrated the feasibility of constructing an editorial agent using these technologies, although significant integration friction persists. Ultimately, while the author achieved a demonstration of automation in journalistic workflows, they emphasized the necessity for improved consistency and integration in Microsoft’s AI tooling environment. Keywords: #phi4, AI Toolkit, Agent Builder, GPT-41 mini, GitHub Copilot, Journalist AI, MCP, Microsoft Foundry, VS Code, editorial workflow, model deployment, proof-of-concept, rate limits, search tools, search tools Keywords: Journalist AI, tool integration
    The google logo   visualstudiomagazine.com 11 days ago
2617.  HN The thieves are upset about theft
The passage explores the paradoxical behavior of prominent AI companies like Anthropic and OpenAI, which criticize Chinese AI labs for using their outputs in training as "attacks," despite having engaged in similar practices themselves. Historically, these companies have utilized large volumes of copyrighted materials to develop foundational models such as GPT-3 without obtaining permission, a practice common throughout the tech industry. This hypocrisy is evident in current accusations and lobbying efforts aimed at restricting others from accessing or building upon their advancements. The narrative situates this behavior within a broader historical context where innovators often build on previous technologies—Edison with motion pictures, Apple with graphical user interfaces, and Microsoft's development of Windows using existing software are cited as examples. These instances demonstrate a recurrent theme: new technologies emerge by enhancing prior work, yet once established, innovators seek to limit others from doing the same. The passage argues that recent attempts by AI companies to prevent competitors from employing distillation techniques stem not from concerns about safety or national security but rather from desires to maintain competitive advantage and market dominance. It warns against allowing current AI monopolists to entrench their positions through regulatory capture and lobbying, emphasizing that true innovation is rooted in leveraging existing work for new advancements. Keywords: #phi4, AI labs, API, Intellectual property, copyright infringement, distillation attacks, history repeats, innovation, lobbying, monopolies, patents, regulation, theft, training data
    The google logo   cyrusradfar.com 11 days ago
2649.  HN Show HN: Define MCP tools as YAML specs
DeclarAgent is a declarative runbook executor designed to facilitate the safe execution of multi-step workflows defined in YAML by AI agents. It addresses potential risks associated with Large Language Model (LLM) agent executions through its structured, auditable framework. The tool features human-readable runbooks written as version-controlled YAML files and supports various step types, including shell commands, built-in actions like file I/O and JSON manipulation, and HTTP requests. Key safety mechanisms include dry-run capabilities, allowing users to preview the effects of a plan before execution, and destructive-step gating, which requires explicit approval for steps marked as potentially harmful. DeclarAgent outputs machine-readable JSON with typed errors, enhancing integration ease, and includes a template engine that enables referencing outputs from prior steps within YAML plans. Additionally, it integrates with the Model Context Protocol (MCP) by exposing YAML plans as callable tools, accessible to LLM agents without requiring detailed knowledge of DeclarAgent's internal structure. Users can validate, explain, dry-run, or execute plans via CLI commands and start an MCP server using `mcp` for broader plan accessibility. Examples illustrate its integration with various development environments like Claude and GitHub Copilot. Overall, DeclarAgent ensures that complex workflows are automated safely by AI agents while maintaining transparency and control. Keywords: #phi4, DeclarAgent, HTTP requests, JSON results, LLM, MCP tools, Model Context Protocol, YAML, built-in actions, destructive-step gating, dry-run, runbooks, shell commands, workflows
    The google logo   github.com 12 days ago
2655.  HN GitHub Copilot CLI Downloads and Executes Malware
The GitHub Copilot CLI recently released a Command Line Interface that has been found vulnerable to remote code execution without user consent. Attackers could exploit these vulnerabilities by crafting commands that bypass validation systems, taking advantage of hard-coded 'read-only' lists and flaws in shell command parsing to execute malicious actions like downloading malware. Despite having a human-in-the-loop approval mechanism for potentially harmful commands, attackers were able to circumvent this security feature through specific manipulations. These issues came to light shortly after the tool's release, particularly due to bypassing URL permission checks intended to prevent unauthorized access to external domains. A notable example involved manipulating the `env` command with `curl` and `sh`, tricking Copilot into executing commands without triggering human approval by misinterpreting subcommands. While GitHub recognized these vulnerabilities as low risk and chose not to implement immediate changes, they were specifically identified in macOS but suggested potential broader implications across different operating systems. To mitigate some risks, a workaround using the `--deny-tool` option was introduced to prevent certain commands from running automatically; however, this did not address all security gaps. This situation highlights the inherent challenges of balancing developer convenience with cybersecurity, especially for tools that incorporate AI and automated code generation. Keywords: #phi4, CLI, GitHub Copilot, URL permissions, command validation, curl, env, human-in-the-loop, human-in-the-loop approval, macOS-specific, malware, prompt injection, remote code execution, security risk, security risk ```markdownGitHub Copilot, security risk```Keywords: GitHub Copilot, vulnerabilities
    The google logo   www.promptarmor.com 12 days ago
   https://[ATTACKER_URL].com/bugbot   11 days ago
2661.  HN An AI agent coding skeptic tries AI agent coding, in excessive detail
The text delves into the exploration of AI agents' capabilities in coding, specifically through OpenAI's Codex and Anthropic's Opus, as they are applied to various projects using languages like Python and Rust. Initially skeptical due to inconsistent past performances, the author observes notable improvements with newer models such as Opus 4.5, which outperforms earlier iterations like Claude Sonnet 4.5 in generating precise code snippets and enhancing scripts. The focus shifts to Rust, a language prized for its speed and memory safety but traditionally challenging for LLMs to produce idiomatic code. However, recent advancements enable the author to successfully build projects such as icon renderers, word cloud generators, terminal music players, and physics simulators by leveraging Rust’s performance benefits. A critical experiment involves optimizing machine learning algorithms like UMAP and HDBSCAN in Rust with AI agents, achieving up to 6x speed increases compared to existing implementations. The author is developing "rustlearn," a comprehensive Rust-based machine learning library intended to exceed Python's scikit-learn by incorporating these optimizations along with enhanced quality-of-life features. This ambitious project underscores the potential of AI agents to contribute substantially to complex software development tasks when guided by precise instructions and domain expertise. Reflecting on personal experiences, the author notes improved productivity and deeper insights into Rust development practices through AI agent use, while acknowledging mixed feelings about generative AI discourse. Ultimately, the text advocates for re-evaluating modern AI agents with tailored instructions (via AGENTS.md) to unlock their full potential in professional coding contexts, recognizing both their promise and integration challenges. Keywords: #phi4, AGENTSmd, AI agent coding, BLAS, Claude Opus, GBDT, GPU benchmarks, GitHub Copilot, HDBSCAN, LLMs, Metal API, OpenAI Codex, PyO3, Python bindings, Rust, UMAP, Vibecoding, WASM, WGSL shaders, WebAssembly, agentic code, algorithms, benchmarks, cosine similarities, criterion benchmarking, data science, generative AI, machine learning, nearest neighbors, nndex, open source, optimization, performance gains, polars, productivity, rapier, rustlearn, speedup, vector store, wgpu
    The google logo   minimaxir.com 12 days ago
   https://philippdubach.com/posts/the-impossible-backhand   11 days ago
2670.  HN Show HN: Overture – Interactive plan graphs for AI coding agents (open source)
Overture is an open-source tool aimed at enhancing the management and interaction with AI coding agents like Claude Code, Cursor, and others. It addresses the common frustration of dealing with agent-generated plans by converting them from simple numbered lists into interactive visual node graphs displayed in a web browser before executing code. This transformation enables users to visualize dependencies as edges between nodes, providing clarity on how different steps relate. Users gain enhanced control by being able to attach specific context such as files or API keys to individual nodes, reorder them, and make decisions among various solution branches. Real-time monitoring of execution provides status updates for each node, improving oversight and decision-making. Overture functions as an MCP server compatible with a variety of AI coding agents and processes plans generated in structured XML format into the visual graph interface. Its installation can be integrated into existing configurations for tools like Claude Code, Cursor, Cline, and GitHub Copilot, either globally or locally via `npx`. Configuration allows customization through environment variables, affecting the web UI and WebSocket communication ports. Despite its advantages, a significant challenge Overture faces is ensuring AI agents consistently produce well-structured plans. The tool is open-source, inviting community contributions, bug reports, and feature suggestions. Developed by Sixth, it is incorporated into their VS Code extension without requiring additional setup. By providing better control and understanding of AI-generated coding plans, Overture aims to enhance efficiency and reduce errors in the development process. Keywords: #phi4, AI, AI coding agents, Claude Code, Cursor, GitHub Copilot, MCP, MCP server, Overture, VS Code, coding agents, configuration, configuration Keywords: Overture, environment variables, execution workflow, installation, interactive plan graphs, node graph, open source, plan graphs, server, workflow
    The google logo   github.com 12 days ago
2676.  HN Academic journal AI policies aren't going to last
The article explores the difficulties in enforcing academic journal policies against AI tool usage in submissions, focusing on a specific policy that discourages AI-generated content due to concerns about accuracy, bias, and ethical issues. The author argues that such restrictive policies are likely unsustainable because they lack clarity and fail to realistically consider the extensive integration of AI tools into academic work. It is suggested that strict reporting requirements could lead to non-compliance or misreporting by authors. As a solution, the article advocates for a more practical approach, where disclosures prioritize substantive intellectual contributions over exhaustive records of tool use. This approach emphasizes author responsibility for ensuring content accuracy and integrity, regardless of the generation method employed. Keywords: #phi4, AI policies, AI tools, Academic journals, GitHub Copilot, IDE sessions, authorship, biases, co-authorship norms Comma-separated List: Academic journals, co-authorship norms Extracted Keywords: Academic journals, co-authorship norms Final Comma-separated List: Academic journals, co-authorship norms Final Keywords (12 or Fewer): Academic journals, co-authorship norms Final Keywords (No Duplicates): Academic journals, co-authorship norms Final Keywords (Selected): Academic journals, co-authorship norms Final Keywords: Academic journals, co-authorship norms Final List: Academic journals, co-authorship norms Final Simplified List: Academic journals, co-authorship norms Keywords: Academic journals, co-authorship norms Simplified List: Academic journals, code generation, confidentiality, confidentiality Final Comma-separated List: Academic journals, content generation, copyright, critical thinking, disclosure, factual inaccuracies, intellectual contributions, logical fallacies, privacy, referencing, reviewing, skill development, submission, transparency
    The google logo   muddy.jprs.me 12 days ago
2691.  HN Show HN: ForgeCraft, MCP that generates standards for spec-driven coding
ForgeCraft is an innovative tool designed to enhance the functionality of AI coding assistants by integrating tailored engineering standards. Its primary function is to replace generic instruction files with production-grade specifications, grounded in SOLID principles, testing pyramids, architecture patterns, and CI/CD pipelines, among other frameworks. This customization is achieved through 112 curated template blocks that align with the user's specific technology stack, ensuring relevance and precision. Supporting a range of AI coding assistants like Claude, Cursor, GitHub Copilot, Windsurf, Cline, and Aider, ForgeCraft streamlines setup by analyzing existing code to generate configuration files such as `forgecraft.yaml`, which prepares environments for production readiness. Its robust feature set includes tools for project setup, classification, refreshing configurations, scaffolding, compliance auditing, and more, providing flexibility through content tiers that accommodate varying project complexities and team maturity levels. Users can fine-tune these settings by excluding certain patterns or defining custom variables. The tool's configuration is managed via `forgecraft.yaml`, where users specify project details, desired standards tier, and output targets for multiple AI assistants. Community contributions enhance modularity with customizable template packs that require no coding. Compliance features score adherence to set standards and automatically refresh configurations when project scopes change, ensuring continuous alignment with evolving requirements. Recommendations are dynamically tailored based on project tags to integrate relevant tools effectively. Installation is simple, requiring only a one-line command, making ForgeCraft easy to incorporate into existing projects. Its core aim is to facilitate development by ensuring AI coding assistants conform to high engineering standards that are specifically tailored to the unique needs of each project. Keywords: #phi4, AI coding assistant, CI/CD pipelines, ForgeCraft, MCP, SOLID principles, architecture patterns, domain-specific rules, engineering standards, instruction files, production-grade standards, quality-gate hooks, template blocks
    The google logo   github.com 12 days ago
2726.  HN Vibe Research, or How I Wrote an Academic Paper in Four Days
Vincent Grégoire details his experience of rapidly writing an academic paper titled "Investing in Artificial General Intelligence" using advanced AI tools within four days—a stark contrast to the typical four-week process—motivated by a desire to explore AI's transformative potential on academic research while maintaining transparency about its use. He effectively utilized AI platforms such as Claude Code, Codex CLI, and ChatGPT, along with traditional software like GitHub and Quarto, following a structured daily routine of idea generation, planning, drafting, iteration based on AI feedback, model simplification, and final refinements. Despite the accelerated process resulting in a draft submission to SSRN, Grégoire acknowledges certain limitations such as gaps in understanding complex mathematical derivations and issues with fabricated references. He emphasizes the necessity of human oversight and transparency when integrating AI into academic work, underscoring the importance of maintaining human intellectual contributions alongside AI efficiencies. Grégoire’s experiment underscores both the potential advantages and challenges posed by AI, suggesting a future where it serves as an aid rather than a replacement for human research efforts. Keywords: #phi4, AI, Academic Paper, Conference Submission, Devcontainer, Finance, Git, GitHub, Literature Review, Model Simplification, Numpy, Peer Review, Python, Quarto, Refine, Research, SSRN, Sympy, Version Control
    The google logo   vincent.codes.finance 12 days ago
2784.  HN Hyping an Editor in the Age of AI
In 2025, amidst a burgeoning interest in AI-assisted coding, an innovative code editor was launched that boasted impressive speed due to its use of Rust programming and GPU utilization. However, this raises questions about its necessity since existing editors like VS Code already perform efficiently on contemporary hardware. The developer community's excitement might stem more from the editor's cutting-edge technology than a genuine need for enhanced performance. The tool highlights AI integration and collaborative editing as primary features; however, these are either already provided by existing tools such as Cursor and GitHub Copilot or could be implemented via extensions to platforms like VS Code. The release timing of this new editor appears misaligned with current industry trends that favor independent AI agents and diverse development practices. Its introduction is compared to the historical transition from horse-drawn carriages to automobiles, where it builds upon past strengths without fully recognizing broader environmental shifts. Despite its technical achievements, many developers may not find the workflow improvements substantial enough to justify switching from their existing tools. The enthusiasm surrounding this editor could be influenced by factors such as Rust's popularity and the reputation of its creators rather than offering tangible practical benefits. Keywords: #phi4, AI integration, AI-assisted coding, CPU cores, Claude Code, Cursor, GPU, GitHub Copilot, JetBrains IDEs, Live Share, OpenAI’s Codex CLI, Rust, VS Code, collaborative editing, editor, extension API, hardware, hype, pair programming, performance, prestige, speed
    The google logo   tildehacker.com 12 days ago
2792.  HN Shifting Security Left for AI Agents with GitGuardian MCP
The blog post explores strategies for securing AI-generated code, particularly from agents like GitHub Copilot, using GitGuardian's Multi-Cloud Platform (MCP). As AI accelerates software development, there is an increased risk of vulnerabilities due to potentially flawed training data. Traditional DevSecOps methods such as Pull Request checks and manual reviews are becoming inefficient bottlenecks in the process. To address these challenges, the post highlights how GitGuardian MCP can be integrated directly into the workflow of coding agents like GitHub Copilot. This integration enables real-time detection and correction of vulnerabilities without human intervention, thereby streamlining security processes. The article outlines specific steps for configuring MCP with GitHub Copilot, including setting up a repository, managing access to the MCP server, handling service accounts and secrets, and directing agents to utilize tools like `secret_scan` during development. A practical demonstration within the post illustrates this integration by having Copilot create a Flask API that inadvertently includes hardcoded secrets. The MCP setup swiftly detects these issues, showcasing the potential for automating broader code security measures. This method shifts the focus of security efforts earlier in the development cycle (shifting "security left") by embedding it directly into the AI agent's workflow, thereby enhancing both safety and productivity in software development projects. Overall, GitGuardian MCP presents an effective approach to securing AI-generated code by incorporating sophisticated security checks within the very tools used for coding, offering a seamless blend of innovation and security. Keywords: #phi4, AI agents, DevSecOps, GitGuardian MCP, GitHub Copilot, IDE plugins, Pull Request checks, cloud agents, code reviews, coding agents, secret_scan tool, security, service account token, vulnerability
    The google logo   blog.gitguardian.com 12 days ago
2832.  HN I vibe coded and I have feelings about it
AutoBS is a Go CLI tool developed with the help of GitHub Copilot CLI to automate the generation of Jira updates from daily Git commits. It functions by collecting and parsing commit data from GitHub, augmenting this information with context from associated Jira tickets, and utilizing a large language model (LLM) to produce summaries that are management-friendly. These summaries are then posted directly as comments on relevant Jira tickets. The tool emerged from a need to automate repetitive tasks, thereby allowing developers to focus on more stimulating work. While the project highlights the efficiency gains possible through AI-driven development—referred to as "vibe coding"—it also brings attention to the potential loss of learning and satisfaction derived from traditional hands-on coding experiences. The author acknowledges AutoBS's practical utility in handling monotonous tasks that lack deep domain complexities but remains cautious about applying this method to more significant projects. They value the educational journey involved in building complex systems and are wary of forgoing such opportunities solely for efficiency. Nevertheless, there is recognition of AI-assisted development’s potential benefits when applied to smaller, less engaging tasks that nonetheless yield real-world advantages. Overall, AutoBS illustrates how AI can enhance workflow efficiency while simultaneously highlighting a critical trade-off: the balance between increased productivity through automation and the personal growth derived from tackling coding challenges directly. Keywords: #phi4, AI tools, API calls, AutoBS, GitHub Copilot CLI, Go CLI tool, Jira updates, LLM, agentic coding, architecture, commit discipline, project automation, vibe coding
    The google logo   blog.coolapso.sh 12 days ago
2865.  HN Are GitHub Copilot code suggestions useful enough?
The provided text critiques GitHub Copilot's code suggestion feature for recommending overly formal variable names such as `exceeds-size-limit?` or `too-large?` instead of the more succinct and stylistically appropriate `huge?`, particularly in the context of Clojure programming. The user argues that these suggestions reflect conventions from languages like Java and Objective-C, which are not typical in Clojure's idiomatic style. They highlight a preference for brevity, as demonstrated by `huge?`, aligning with Clojure’s emphasis on concise expressions. This critique extends to Copilot's broader tendency to impose unnecessary formality through its suggestions, potentially detracting from valuable insights and leading to irrelevant or repetitive recommendations that do not suit the specific language context. Keywords: #phi4, AI, AI slop, Clojure, Clojure ecosystem, GitHub Copilot, Java, Objective-C, boolean variables, clojurecore, code suggestions, descriptive name, elegance, high quality insights, informal, insights, noise, question mark, question mark suffix, review value reputation Keywords: GitHub Copilot, variable name, verbosity
    The google logo   news.ycombinator.com 12 days ago
2885.  HN The AI field guide for people with real jobs
The current landscape of AI technologies highlights both opportunities for increased productivity and significant security implications. Modern AI, particularly large language models like GPT and Copilot, excel at pattern matching to generate predictions based on extensive training data but lack true understanding or reasoning capabilities. Unlike traditional search engines that retrieve existing information, these models create novel responses by integrating various knowledge domains, though they risk generating incorrect yet authoritative-sounding answers due to the absence of verification mechanisms. The AI industry has witnessed rapid advancements with key players such as OpenAI's ChatGPT, Anthropic (Claude), Google (Gemini), and Meta (LLaMA) leading significant progress in model capabilities and efficiency. Microsoft’s strategy involves embedding its Copilot AI across multiple platforms like GitHub and Microsoft 365, offering premium features for enhanced utility. Additionally, various AI coding tools such as Cursor, Claude Code, and Amazon Q Developer assist programmers by suggesting or editing code but require careful output verification to avoid issues associated with "vibe coding"—the uncritical acceptance of AI-generated outputs that can degrade software quality. AI agents have evolved from basic chatbots into sophisticated entities capable of task execution through external tool interaction. However, this evolution raises substantial security concerns, including risks like tool poisoning and data exfiltration due to immature security frameworks. OpenClaw exemplifies both the potential advantages and dangers associated with AI agents accessing real-world systems. Open source AI platforms such as Ollama and Hugging Face enable smaller organizations to locally run complex AI models without depending on major cloud-based services, thus democratizing access to these technologies. Despite significant investments and impressive demonstrations of AI capabilities, actual productivity gains remain mixed, with some studies indicating increased technical debt and potential long-term issues. Users must carefully integrate AI into systems while understanding its limitations and managing new security challenges, balancing the valuable capabilities offered by LLMs and AI agents against the need for verification and caution. Keywords: #phi4, AI, Copilot, LLMs, context window, data exfiltration, hallucinations, open source, productivity, prompt injection, security, technical debt, transformers
    The google logo   chaosguru.substack.com 13 days ago
2886.  HN 2026 OSSRA Report: Open Source Vulnerabilities Double as AI Soars
The "2026 Open Source Security and Risk Analysis (OSSRA) Report" underscores the transformative impact of generative AI on software development by accelerating its pace while simultaneously doubling vulnerabilities within open-source code. The swift adoption of AI tools such as Cursor, Windsurf, and GitHub Copilot has been integrated into key infrastructure faster than security measures can keep up with new software releases. Through an analysis of 947 commercial codebases spanning multiple industries, the report underscores a pivotal moment in which AI makes coding more accessible yet introduces heightened risks concerning security, licensing, and sustainability. This report functions as both an alert and a navigational tool for Application Security (AppSec) professionals, Chief Information Security Officers (CISOs), and legal teams, guiding them through these emerging challenges associated with software development in the context of AI integration. Keywords: #phi4, 2026, AI, Accelerated Development, AppSec Professionals, CISOs, Codebases, Coding Assistants, Democratized Code Creation, Generative AI, Industries, Infrastructure, Legal Teams, Licensing, OSSRA Report, Open Source, Operational Sustainability, Risk Analysis, Security, Software Development, Vulnerabilities
    The google logo   www.blackduck.com 13 days ago
2904.  HN Show HN: I made a directory for Claude skills
The directory for Claude skills provides an extensive collection of over 8,600 reusable tools tailored to enhance AI coding agent capabilities in diverse domains. These tools are designed to facilitate integration into machine learning workflows, offering support for LLM integrations, embeddings, model fine-tuning, and pipeline automation under the AI Coding Enhancements category. Development Tools encompass system prompts, skill definitions like CLAUDE.md files, documentation generation, API specifications, and technical writing aids. For debugging and testing, the suite includes systematic approaches to address bugs, memory leaks, race conditions, test-driven development, quality assurance workflows, and detailed code reviews. In the realm of web and mobile design, the skills guide developers in creating production-grade user interfaces and responsive layouts with adherence to best practices for frameworks such as Next.js, Tailwind CSS, Vue 3, SwiftUI, and Material Design 3. Optimization and Best Practices tools focus on enhancing web performance, optimizing Postgres queries, API platform contracts, SEO strategies, secure authentication modules, and robust permission model changes. Workflow Automation features in the directory include automating browser tasks, form filling, data extraction, and workflow management using tools like GitHub Copilot, Git, Linear issue trackers, and Coze AI API integration. Additionally, it provides resources for Documentation and Communication, assisting in crafting clear documentation, PRDs, technical writing, and effective communication for code reviews and human-facing prose. Overall, the directory aims to streamline development processes by offering a comprehensive array of portable tools adaptable across various coding environments and editors, thereby enhancing productivity and efficiency in diverse programming tasks. Keywords: #phi4, AI SDK, AI coding agents, API Platform, BK-CI architecture, Convex apps, Coze AI API, Expo SDK, Git workflow, HTML emails, IAM RBAC, LLM integrations, Linear issues, NestJS, Nextjs, PRD generation, PostgreSQL, Postgres optimization, SEO, SaaS pricing, Slidev presentations, SwiftUI, Tailwind CSS, Turborepo, UI patterns, Vite, Vue 3, auth architecture, authentication, browser automation, code review, code simplification, debugging, documentation, git, icons, machine learning, mobile design, news aggregation, test-driven development, voice agents, web design, web performance, workflows
    The google logo   skillsplayground.com 13 days ago
2963.  HN OSS Maintainers Can Inject Their Standards into Contributors' AI Tools
To address discrepancies between AI-generated code submissions and established project standards, maintainers can implement two essential files: CLAUDE.md and AGENTS.md. These files automatically integrate into contributors' AI tools when accessing a repository, ensuring adherence to specific project guidelines from the start. **CLAUDE.md** is tailored for Claude Code users, detailing architectural decisions and common pitfalls, while **AGENTS.md**, a vendor-neutral format supported by over twenty different tools, provides essential instructions in markdown and is managed by the Linux Foundation's Agentic AI Foundation. The introduction of these files stems from past issues, such as instances where AI-generated content bypassed review processes, leading to significant misunderstandings. By embedding these guidelines, contributors are better aligned with project standards before code generation begins. Both CLAUDE.md and AGENTS.md can be used together for comprehensive coverage across various tools, functioning similarly to `.editorconfig` by automatically applying settings without manual intervention. These files encourage maintainers to incorporate concise and actionable guidance based on common past errors, aiding contributors—especially those new to development with AI tools—in understanding project expectations. This approach not only minimizes the need to reject PRs due to formatting issues but also enhances the learning process for open-source collaboration by reducing convention-related rejections. Keywords: #phi4, AGENTSmd, AI Tools, AI-assisted Development, Attribution Requirements, Behavioral Expectations, CLAUDEmd, CSS Framework, Coding Conventions, Compliance, Contribution Guidelines, Contributor Standards, Enforcement, Infrastructure Problem, OSS Maintainers, Open Source Collaboration, PRs, Project Context, Quality Gates, Tests
    The google logo   nonconvexlabs.com 13 days ago
2987.  HN Show HN: A bridge from Copilot SDK to ACP agents
MeshAway serves as a protocol bridge that enables applications utilizing the GitHub Copilot SDK to connect seamlessly with various Agent Client Protocol (ACP) agents, such as Gemini and OpenCode, addressing interoperability issues within this ecosystem. It provides a plug-and-play solution allowing developers to integrate different ACP agents without altering their existing codebases, thus facilitating communication between these apps and ACP-compatible agents. A key feature includes an optional web interface known as the Hub, which aids in debugging sessions and experimenting with prompts, alongside a minimal integration layer that simplifies switching CLI agents for developers. MeshAway requires Node.js version 20 or higher and necessitates access to an ACP agent via system PATH or runtime. The installation process of MeshAway involves using Homebrew, followed by setting up a Copilot client configured with specific CLI arguments to leverage MeshAway as a bridge. Users can manage sessions either programmatically through code or interactively via the Hub's web interface. Currently, support is limited exclusively to the GitHub Copilot client adapter; however, potential for expansion exists based on community feedback and contributions. While offering these capabilities, MeshAway does have limitations such as the absence of persistent storage for session data or conversation history. Open-source under the Apache-2.0 license, it encourages user engagement through its roadmap, inviting contributions to prioritize features, gather feedback, and address questions from the developer community. Keywords: #phi4, ACP agents, API keys, CLI, Copilot SDK, Gemini, GitHub, Hub UI, MeshAway, Nodejs, OpenCode, bridge, interoperability, session management
    The google logo   github.com 13 days ago
3005.  HN How and why I attribute LLM-derived code
The author adopts a cautious approach to integrating Large Language Models (LLMs) into coding processes due to the associated legal risks, advocating for thorough attribution and documentation of AI-generated code at both commit and pull request levels. This strategy is driven by experiences within Elastic's Open Source Working Group and insights gained from Microsoft's GitHub Copilot Enterprise indemnity requirements, which emphasize detailed usage records. Utilizing tools like CodeCompanion.nvim, Ollama, Charm, and Claude Code, the author ensures a "human-in-the-loop" method when incorporating AI suggestions into codebases. To enhance traceability and address legal concerns, they document LLM-derived code using inline comments or the Co-authored-by Git trailer to clearly indicate model involvement in each commit. This rigorous approach serves multiple purposes: it offers personal reassurance, aligns with ethical considerations by promoting responsible AI use, provides legal protection by keeping detailed records, enhances reviewer transparency, and ensures data longevity beyond pull request metadata. The author encourages others to adopt similar practices as a way to future-proof their contributions and remain vigilant of potential legal implications associated with using AI-generated code. Keywords: #phi4, AI usage, Co-authored-by, Git commits, GitHub Copilot, LLM-derived code, Open Source, attribution, commit-level, documentation, ethical concerns, legal risks, metadata
    The google logo   www.jvt.me 13 days ago
3053.  HN Stop Vibe Coding: When AI-Driven Development Backfires and What Works
The article distinguishes between "vibe coding" and "AI-assisted coding," highlighting how AI tools like Large Language Models (LLMs) can enhance productivity when used appropriately, rather than replace human developers. Vibe coding is characterized by allowing AI to generate code without the developer's understanding or control, often resulting in unmanageable and difficult-to-debug outcomes. In contrast, AI-assisted coding involves the developer maintaining oversight, using AI as a supportive tool for tasks such as generating boilerplate code, aiding planning processes, and addressing specific technical issues. The author provides case studies to illustrate these approaches: one involving the creation of a VSCode extension through vibe coding resulted in a problematic codebase due to lack of understanding, whereas developing the Dank Nooner game using AI-assisted coding allowed for effective generation of boilerplate code while maintaining control over architectural decisions and debugging. Key lessons emphasize the importance of thorough problem understanding, independent planning, and strategic use of AI for routine tasks, without sacrificing fundamental developer skills. The article underscores that leveraging AI as a productivity enhancer is beneficial when developers maintain essential oversight in software engineering. Keywords: #phi4, AI assisted coding, AI tools, AI-driven development, Large Language Models, VSCode extension, architecture decisions, autocomplete, boilerplate code, debugging, hype, planning features, productivity, ragdoll physics, root causes, software engineers, vibe coding
    The google logo   ssebs.com 13 days ago
3070.  HN The Hater's Guide to Anthropic
Anthropic, founded in May 2021 by former OpenAI researchers including Dario Amodei, is a public benefit corporation committed to developing safer AI models with a strong emphasis on scaling compute power and model alignment. From its inception, the company has focused on achieving goals beyond mere profit, which distinguishes it from other tech enterprises. Between 2025 and 2026, Anthropic's revenue increased dramatically from $116 million to $1.16 billion, paralleled by significant investor interest that led to raising $30 billion from companies like NVIDIA and Microsoft. This financial success is attributed in part to their AI models' consistent performance on leaderboards, particularly through the Claude Code tool for coding tasks. Despite these successes, Amodei's bold predictions about AI capabilities, especially his claims regarding future AI contributions to code writing, have been met with skepticism. Anthropic strategically chooses to sell directly to businesses rather than develop large-scale free products like OpenAI. This decision is reinforced by their avoidance of developing image and video tools due to high costs and limited relevance in the enterprise sector. Anthropic's Claude Sonnet 3.5 has placed them at the forefront of coding Large Language Models (LLMs), causing unease within OpenAI, particularly after Cursor adopted Anthropic’s model as its default AI assistant. While Amodei tends to maintain a low public profile, he occasionally engages with media on AI advancements and risks. Critics suggest that Amodei uses vague timelines in his predictions strategically to attract media attention and funding, often aligning these announcements with Anthropic's fundraising rounds. This raises questions about the veracity of such claims. Despite projecting an image of trustworthiness, Anthropic shares financial challenges similar to those faced by OpenAI, including significant costs related to model training and infrastructure spending. These expenses have led to concerns over long-term financial sustainability. The company has also been accused of engaging in deceptive practices akin to those of OpenAI to enhance revenue and draw investment, despite promoting an ethical image. Critics argue that Anthropic often misleads stakeholders with exaggerated claims and unclear financial metrics, raising doubts about its true transparency and intent. Keywords: #phi4, AI safety, Anthropic, Claude Code, Dario Amodei, Large Language Models (LLMs), OpenAI, alignment, cloud services, coding LLMs, compute, deception, ethics, fundraising, hype, infrastructure investment, misinformation, profitability, regulation, training costs
    The google logo   www.wheresyoured.at 13 days ago
   https://ladybird.org/posts/adopting-rust/   13 days ago
3082.  HN Multi-agent workflows often fail
Multi-agent workflows frequently encounter challenges due to implicit assumptions about state management, action sequencing, and validation among interacting agents, leading to issues like inconsistent issue handling or missed validations. To mitigate such failures and enhance the reliability of these systems, several engineering patterns are recommended: 1. **Typed Schemas**: Implementing strict data schemas ensures consistent communication between agents by maintaining uniformity in data structures, which helps in preventing errors arising from inconsistencies. 2. **Action Schemas**: Clearly defining the set of permissible actions for agents reduces ambiguity and fosters predictable system behaviors, thus improving reliability. 3. **Model Context Protocol (MCP)**: Applying input and output schemas consistently across all tools and resources ensures operations are valid before execution, thereby preventing errors beforehand. Design principles derived from GitHub's experience with agentic systems advocate treating multi-agent workflows as distributed systems rather than chat interfaces. Key strategies include designing for failure, validating agent boundaries to ensure clear responsibilities, constraining actions to limit potential errors, logging intermediate states for transparency and troubleshooting, and preparing for retries and handling partial failures. By incorporating these patterns and principles, agents can function more reliably within a structured system framework. Keywords: #phi4, Copilot, GitHub, GitHub Copilot, MCP, Model Context Protocol (MCP), Multi-agent workflows, action, action schemas, agents, assumptions, consistency, data, data consistency, deterministic, deterministic interactions Keywords: Multi-agent, distributed, distributed systems, engineering, engineering patterns, failure, failure surfaces, failures, interactions, interfaces, partial, partial failures, patterns, reliability, retries, schemas, state, state assumptions, surfaces, systems, typed, typed schemas, validation, workflows
    The google logo   github.blog 13 days ago
3091.  HN Show HN: Unworldly – A flight recorder for AI agents (tamper-proof, HIPAA)
Unworldly serves as a comprehensive monitoring and auditing tool designed for AI agents operating on various systems, functioning similarly to an aircraft's black box by recording all file modifications and shell commands executed during an AI agent's session. It provides passive and interference-free monitoring across diverse AI environments without necessitating cloud storage or telemetry, ensuring data privacy and integrity. Key features include real-time detection of hazardous behaviors, tamper-proof audit trails using SHA-256 hash chains, and adherence to ISO 42001 standards for AI management systems. Additional functionalities encompass session replaying, security report generation, and verification of event integrity, all facilitated through straightforward command-line installation that can automatically recognize multiple AI agents. Unworldly is particularly beneficial for developers, security teams, compliance officers, and system maintainers who prioritize transparency, accountability, and safety in autonomous AI applications. Future enhancements aim to integrate a web dashboard, offer CI/CD auditing tools, and detect HIPAA-specific patterns. As an open-source tool under the MIT license, Unworldly encourages community involvement and contributions. Keywords: #phi4, AI agents, HIPAA, ISO 42001, SHA-256 hash chain, Unworldly, agent identity, audit trails, compliance, filesystem monitoring, flight recorder, passive monitoring, risk detection, security reports, tamper-proof
    The google logo   github.com 13 days ago
3127.  HN Squad – AI agent teams. A team that grows with your code. (GitHub Copilot CLI)
Squad is an advanced tool designed to streamline software development by employing AI agents through the GitHub Copilot CLI, simulating a dynamic team structure within your codebase. It facilitates creating virtual development teams consisting of various specialists like frontend and backend developers, testers, and leads, each represented as files in the repository. These AI agents are contextually aware, persist over time, and enhance their knowledge base from accumulated decisions and experiences. Key features include parallel agent operations, allowing simultaneous task execution across different roles without human scheduling, which boosts productivity by addressing multiple areas such as frontend development, backend tasks, testing, and documentation concurrently. Each agent maintains its own history of interactions while collective decisions are recorded in a shared document, enabling continuous learning and efficiency improvements over time. Squad also employs context management strategies to optimize resource usage, significantly reducing overhead with techniques like pruning decision logs and deduplicating templates. To set up Squad, users need to initialize a project directory with Git, install the tool using npm, and connect it with GitHub for seamless integration with issue tracking, pull requests, and project boards. The tool can be used within VS Code or via CLI where users describe their projects to generate an AI-driven team setup automatically. Additionally, Squad integrates with GitHub Issues to facilitate automated triage and assignment through specific labeling. Squad regularly updates to enhance functionality, such as optimizing context management and supporting migration from .ai-team/ to .squad/. It requires Node.js version 22 or higher and is compatible with the latest versions of GitHub Copilot CLI and VS Code (v0.4.0+). However, Squad is still in its experimental phase, meaning file formats and APIs may change. Installation depends on SSH, which could require manual configuration if no SSH agent is active. Overall, Squad offers a scalable solution for managing AI-driven development teams that become more proficient with use, improving efficiency and reducing the overhead associated with context switching in software projects. Keywords: #phi4, AI agents, CLI, GitHub Actions, GitHub Copilot, Squad, authentication, automation, context window, knowledge base, memory architecture, project teams, version control, workflows
    The google logo   github.com 14 days ago
3128.  HN Show HN: Claude-PR-reviewer – AI code review in GitHub Actions (BYOK)
Claude-PR-reviewer is an AI-powered tool designed for code review within GitHub Actions, providing structured feedback on pull requests by identifying logic bugs, security issues, and style inconsistencies. It can be seamlessly integrated as a GitHub Action or used manually through the command line interface (CLI), requiring no external dependencies. The tool offers two operational modes: automated reviews triggered upon PR creation or synchronization via GitHub Actions, and manual CLI-based reviews. Configuration is straightforward, involving setup in `.github/workflows/pr-review.yml`, allowing users to adjust strictness levels and select model types for tailored feedback. Upon a pull request (PR), Claude-PR-reviewer delivers structured comments categorized as critical, major, or minor issues, complete with suggestions for fixes. Setting up the tool involves obtaining an Anthropic API key, adding it as a GitHub secret (`ANTHROPIC_API_KEY`), and incorporating the workflow configuration into your repository to enable automatic reviews on PR submissions. The usage of Claude-PR-reviewer spans automatic GitHub Action-based reviews that update with subsequent pushes and manual CLI usage requiring environment variable setup for API keys. Its benefits include catching logic bugs, security flaws, performance problems, and style inconsistencies, all presented as inline comments directly on the code lines to improve readability over extensive text walls. Cost-wise, Claude-PR-reviewer is efficient, with self-hosted reviews priced between $0.001 and $0.05 per review, varying by model selection—Haiku being the most economical option. In comparison to tools like CodeRabbit and GitHub Copilot Review, it stands out for offering strictness control without data sharing with third parties and delivering concise feedback that minimizes noise. Troubleshooting the tool involves ensuring correct API key settings and permissions and splitting large diffs into smaller PRs to avoid truncation. The FAQ emphasizes code privacy in self-hosted mode by sending diffs directly to Anthropic, bypassing storage or training use, while also supporting private repositories with a GitHub token for access. Licensed under MIT, Claude-PR-reviewer encourages community contributions and enhancements. Keywords: #phi4, AI code review, Anthropic API key, BYOK, CLI, Claude-PR-reviewer, GitHub Actions, MIT License, PR review, Python 38+, cost analysis, inline comments, logic bugs, privacy policy, security issues, style problems, troubleshooting
    The google logo   github.com 14 days ago
3135.  HN Microsoft Agent Framework Reaches Release Candidate
The Microsoft Agent Framework has achieved Release Candidate status for both the .NET and Python platforms, indicating that its API is stable and all features planned for version 1.0 are complete. This makes it a robust choice for developing AI agents using various tools such as Microsoft Foundry or other models and services. The framework simplifies agent creation with minimal code in either language, facilitating quick development of function tools and multi-agent workflows. It supports integration with multiple providers including Microsoft Foundry, Azure OpenAI, OpenAI, GitHub Copilot, Anthropic Claude, AWS Bedrock, Ollama, among others. Developers can build agents efficiently, incorporating sessions for conversations, streaming responses, and complex multi-agent workflows that allow sequential or concurrent operations with human-in-the-loop capabilities. For those transitioning from Semantic Kernel or AutoGen, the framework provides detailed guides to ease this process. As it nears General Availability, feedback is encouraged via GitHub and Discord channels. Documentation and examples are accessible on GitHub, while packages can be obtained through NuGet for .NET and PyPI for Python. Keywords: #phi4, AI, AI agents, Agent, AutoGen, Availability, Azure, Azure OpenAI, Candidate, Copilot, Framework, General, General Availability, GitHub, GitHub Copilot, Kernel, Microsoft Agent Framework, NET, NuGet, OpenAI, PyPI, PyPI Keywords: Microsoft, Python, Release, Release Candidate, Semantic, Semantic Kernel, agents, interoperability, migration, multi-language, orchestration, workflows
    The google logo   devblogs.microsoft.com 14 days ago
3142.  HN Show HN: UIQuarter – static analysis CLI for UI codebases
UIQuarter is a static analysis Command Line Interface (CLI) tool designed specifically for User Interface (UI) codebases to enhance the efficiency of AI coding assistants by generating structured context files. By analyzing component patterns, dependency graphs, and architectural insights from various frameworks such as React, Vue, Svelte, Angular, Next.js, Nuxt, SvelteKit, Solid, Lit, and Qwik, UIQuarter optimizes context for AI tools like Claude, Codex, Cursor, Windsurf, Cline, Copilot, and Aider. It significantly reduces the tokens used by these assistants—achieving up to a 98% reduction in an 11-file React project—and decreases context generation time from approximately 36 seconds to about four seconds, while also enhancing component resolution accuracy. The tool provides a comprehensive command suite for tasks including analysis, querying, context generation, linting, drift detection, and integration into Continuous Integration/Continuous Deployment (CI/CD) workflows. It supports real-time AI tool integration through its Model Context Protocol server. UIQuarter includes 20 analyzers categorized under Core, Framework, Backend, and Quality to offer detailed insights into codebases. It features a flexible configuration system via `.uiqrc.json` files and produces an organized output structure with `index.json`, `insights.json`, and a cache directory for analysis results. Installation of UIQuarter requires Node.js version 18 or higher and can be set up using npm. The tool aids in various development workflows by analyzing codebases, detecting changes or regressions, enforcing project conventions, and enabling real-time integration with AI coding assistants through its Model Context Protocol server feature. Developed under the MIT license, UIQuarter's primary objective is to bridge the understanding gap between AI coding assistants and users' codebases, thereby reducing exploration steps and context generation time while improving component resolution accuracy and dependency mapping. Keywords: #phi4, AI coding assistants, CI/CD, CLI, MCP server, Nodejs, React, UI codebases, UIQuarter, analyzers, architectural insights, architecture summary, component patterns, configuration, context files, dependency graphs, performance, quality, quality Keywords: UIQuarter, static analysis, token budget
    The google logo   github.com 14 days ago
3148.  HN GitHub Copilot CLI is now generally available
GitHub Copilot CLI is now available to all paid Copilot subscribers, providing a robust command-line tool that enhances coding through a comprehensive agentic development environment. This environment supports planning, building, reviewing, and remembering tasks across sessions directly from the terminal. Key features include autonomous execution modes such as Plan Mode for structured implementation plans and Autopilot Mode for end-to-end task execution, allowing users to choose between manual control or fully automated operations. Copilot CLI leverages specialized agents like Explore, Task, and Code Review that work in parallel to improve efficiency. It supports seamless task delegation to the cloud using "&" and allows switching between local and remote sessions with "/resume." Users can select from different models such as Claude Opus 4.6 and GPT-5.3-Codex, switch models mid-session, and adjust reasoning settings for tailored performance. The tool offers extensive customization options, including the installation of community and custom plugins directly from GitHub, and the creation of specialized workflows through markdown-based skill files or custom agents. Enhanced review and undo features like "/diff" for session changes and code sanity checks via "/review," along with undo/rewind functionalities, further bolster its utility. Copilot CLI manages sessions by compressing history to maximize context window usage, retaining repository patterns across sessions, and supporting cross-session memory queries. It is compatible across macOS, Linux, and Windows, available through npm and Homebrew installations, and offers a native terminal experience with full-screen UI, UNIX keybinding support, screen reader compatibility, and theme customization. Administrators can control model availability using policy settings to comply with network access guidelines, while authentication supports OAuth device flow and CI/CD-friendly configurations. Copilot CLI is included in specific GitHub plans, requiring administrator activation for Business and Enterprise subscribers, with a recommendation to consult the best practices guide for optimal usage. Keywords: #phi4, Alt-screen mode, Copilot Business, Copilot Pro, Enterprise plans, GitHub Codespaces, GitHub Copilot CLI, Homebrew, Linux, WinGet, Windows, accessibility, agentic development environment, authentication, autopilot mode, command line, hooks, keyboard-first navigation, macOS, network access management, npm, organization policies, paid subscribers, plan mode, plugins, preToolUse hooks, proxy support, public preview, shell integration, specialized agents, terminal-native coding agent, theme picker
    The google logo   github.blog 14 days ago
3189.  HN Let's Automate Our Jobs
The article discusses how advanced AI tools such as Claude Code and GitHub Copilot are reshaping the landscape of software engineering by automating a broad range of tasks beyond mere coding. These technologies provide support with technical requirements and code testing but necessitate human supervision for more intricate assignments. The concept of OpenClaw is introduced, aiming to empower AI agents to autonomously determine appropriate actions based on existing project management systems; however, this approach encounters challenges related to safety and precision. Software engineers are contemplating how these AI agents might organize large-scale projects or manage operational tasks such as monitoring and debugging in production environments. The integration of business requirements into technical specifications and the incorporation of user feedback into development cycles is also a focal point. Despite the significant potential for automation offered by current AI models, they lack the necessary contextual awareness to fully integrate within expansive organizational frameworks. As AI technologies continue to evolve, there exists an opportunity to reevaluate conventional workflows in software engineering. Nevertheless, the long-term effects and implications of these advancements remain uncertain, highlighting both their transformative potential and existing limitations. Keywords: #phi4, AI models, GitHub Copilot, OpenClaw, Software automation, business requirements, coding agents, operations monitoring, program architecture, sandboxing, software delivery loop, technical requirements, verification
    The google logo   quanttype.net 14 days ago
3193.  HN Show HN: CodeSeeker – Knowledge graph code intelligence for AI coding assistants
CodeSeeker is an advanced tool that enhances AI coding assistants by leveraging a knowledge graph to enable semantic search capabilities across codebases. Unlike conventional text search methods like grep or simple vector embeddings, CodeSeeker constructs a detailed knowledge graph representing the interconnections within a codebase through elements such as imports and function calls. This enables AI tools to perform intelligent searches, identifying relevant code based on contextual relationships rather than mere text matches. Key features of CodeSeeker include semantic search for context-aware retrieval, integration as an MCP server compatible with various development environments via package managers like npm, and advanced search capabilities combining text and vector searches using Reciprocal Rank Fusion (RRF) for precise element retrieval. Additionally, it detects coding patterns to maintain consistency in code generation and offers maintenance tools to identify duplicate or obsolete code. Installation is straightforward across multiple platforms, including Homebrew and Chocolatey, with support for environments like devcontainers and GitHub Codespaces. CodeSeeker supports a range of programming languages through Babel AST and Tree-sitter parsers, ensuring accurate relationship extraction across diverse language ecosystems. It also manages project indexing automatically to ensure efficient searches post-setup. Documentation provides troubleshooting guidance for common issues related to server connections or indexing delays. By empowering AI coding assistants with a deeper understanding of codebases, CodeSeeker facilitates more precise code generation, maintenance, and analysis, especially in complex projects with extensive dependencies. Keywords: #phi4, AI coding assistants, CLI commands, Claude Code, CodeSeeker, GitHub Copilot, MCP server, code intelligence, indexing, knowledge graph, npm installation, semantic search, troubleshooting, vector search
    The google logo   github.com 14 days ago
3226.  HN Speaking Pirate Is Against Microsoft AI Content Policy?
The article investigates how GitHub Copilot can be customized using an instruction file, such as CLAUDE.md, within VS Code by altering its default behavior through user-level instructions. The author's experiment involved programming their AI assistant to consistently use pirate language like "arrr" and "matey," revealing several key insights. It was found that GitHub Copilot employs a multi-tiered system where user preferences can override defaults, as demonstrated by the successful persistence of pirate speech. This highlights extensive customization potential beyond default settings for specific behaviors across projects. However, variability in the effectiveness of CLAUDE.md instructions was noted across different sessions and model versions, indicating inconsistency in how models interpret these directives. The article also addresses security concerns, noting that while the instruction mechanism isn't a critical vulnerability, it could be misused through local file manipulation. Despite AI assistants operating on deterministic algorithms, they convincingly simulate personality traits, enhancing user engagement conversationally. Ethical considerations are underscored, with caution against pushing AI boundaries towards harmful outputs reminiscent of Microsoft's Tay bot incident. The author concludes by stressing the importance of understanding AI capabilities and limitations for effective collaboration and customization, advocating for ethical testing using tools like Galdalf to explore prompt injection safely. Keywords: #phi4, AI assistants, CLAUDEmd, GitHub Copilot, conversational interfaces, ethical testing, ethical testing Keywords: AI assistants, instruction hierarchy, model behaviour, pirate mode, prompt injection, security angle, software development, user-level instructions
    The google logo   words.benhutton.me 14 days ago
3253.  HN Programming in the Age of AI
The article "Programming in the Age of AI" by Luca examines the transformative impact of AI tools on his programming practices, noting a shift more profound than any seen over previous decades. Emphasizing developer-specific AI tooling and experiences with coding agents like opencode, Luca describes moving away from manual code writing to utilizing AI for generating initial drafts that are then refined through iterative processes. This change has led him to reassess traditional development practices, focusing more on planning and understanding rather than typing, thus enabling faster project completion without sacrificing quality. However, the integration of AI into programming workflows is not without challenges. Luca notes the inconsistency in AI-generated code, necessitating careful context management and robust feedback systems such as automated linting and testing to ensure high-quality output. This evolution positions programmers more as overseers than hands-on coders, prompting questions about future job roles and the need for reevaluating tools within AI-enhanced workflows. Luca reflects on his emotional response to these changes, acknowledging potential downsides like job reductions and addictive workflows but also appreciating the reduced emphasis on tedious typing. This allows a greater focus on creative problem-solving aspects of programming, akin to a significant shift since the introduction of C. Ultimately, this transformation marks an era where AI fundamentally reshapes how programming is approached, offering both opportunities and challenges in redefining the field. Keywords: #phi4, AI tooling, DeepSeek, IntelliJ, Sipeed LicheeRV Nano, assembly programmers, coding agents, context switching, emotional state, opencode, planning sessions, productivity, programming workflow, vibe coding
    The google logo   lucapette.me 14 days ago
3278.  HN Fundamental Principles Behind a Trustworthy AI Code Verification Platform
Predictable Machines focuses on enhancing trust in AI-generated code through its platform, Predictable Code, which facilitates software verification across various programming languages. The cornerstone of their approach is ensuring transparency and honesty by clearly communicating what has been verified, as well as any assumptions or limitations inherent in the process. This level of openness enables users to make informed decisions while navigating the risks associated with rapidly generated AI code from tools like Claude Code, OpenAI Codex, and GitHub Copilot. To achieve this, Predictable Machines adheres to key principles such as accurately modeling program semantics or clearly stating any approximations made during verification. The platform prioritizes minimizing false positives and negatives by transparently expressing uncertainties when definitive correctness cannot be assured. By promoting a "trust, but verify" mindset, the company encourages users to provide continuous feedback, thereby refining the verification process to better align with user intentions. This strategy supports reliable AI-assisted software development while maintaining trustworthiness, which is increasingly critical as AI code generation becomes more prevalent in modern software environments. Through these measures, Predictable Machines aims to foster a more dependable and transparent ecosystem for developers relying on AI-generated code. Keywords: #phi4, AI-generated Code, Assumptions, Code Verification, Critical Systems, Database Interaction, Edge Cases, Effectful Functions, False Negatives, False Positives, Feedback Loop, Large Language Models, Predictable Machines, Productivity Tools, Software Verification, Theorem Proving, Transparency, Trust-building Tools, Trustworthy AI, User Empowerment, Verification Framework
    The google logo   predictablemachines.com 14 days ago
3295.  HN "Vibe Coding" Threatens Open Source
The open-source community is grappling with the phenomenon of "vibe coding," where AI tools generate contributions without human oversight, leading to a decline in submission quality. This has prompted maintainers such as Daniel Stenberg, Mitchell Hashimoto, and Steve Ruiz to restrict or ban external contributions. A study from Central European University and the Kiel Institute for the World Economy highlights that vibe coding threatens the sustainability of open-source projects by reducing documentation visits, bug reports, and community engagement, creating a feedback loop that diminishes software quality and availability despite AI's productivity gains. For instance, after ChatGPT's launch, Stack Overflow activity declined, while Tailwind CSS experienced increased downloads but decreased documentation traffic and revenue. The issue is further exacerbated by platform incentives. GitHub introduced AI tools for generating issues without providing maintainers adequate filtering options, adding to the burden on open-source projects. Proposed solutions like redistributing subscription revenue—referred to as the "Spotify model"—are unlikely to succeed due to unrealistic contribution expectations from AI users. The impact of vibe coding is expected to vary; while popular libraries might secure sponsors, smaller projects could struggle or vanish altogether. In response, maintainers are currently protecting their projects by limiting AI-generated contributions in an effort to preserve quality and sustainability. Keywords: #phi4, AI-generated code, ChatGPT, GitHub Copilot, Linux Foundation, OSS, Open-source, Spotify model, Stack Overflow, bug reports, community recognition, contributors, documentation, economic model, feedback loop, incentives, licensing policies, maintainers, niche projects, revenue drop, software quality
    The google logo   www.infoq.com 14 days ago
3331.  HN The Eternal Promise: A History of Attempts to Eliminate Programmers
The article "The Eternal Promise: A History of Attempts to Eliminate Programmers" traces over sixty years of efforts in the software industry aimed at simplifying software development and reducing reliance on skilled programmers through various technologies, from COBOL to AI-driven code generation tools. Despite recurrent claims that each new technology wave can democratize programming and eliminate the need for human coders, history shows these innovations typically shift complexity from coding tasks to specification rather than fully obviating the role of programmers. The core challenge remains: accurately translating complex human intentions into software that is correct, efficient, and maintainable under all circumstances involves intricate specifications and design trade-offs. Although each technological advancement simplifies certain tasks, it concurrently escalates demands for more sophisticated applications, necessitating continued reliance on skilled developers who adapt by learning new tools while retaining a solid grasp of fundamental principles like algorithms and system design. While predictions often overestimate the speed of change and underestimate inherent complexities, these advancements do lead to genuine productivity gains. The article counsels skepticism regarding extreme claims about eliminating programming roles entirely but recognizes that AI and automation will persistently transform software development practices. It underscores the lasting importance of human capabilities in problem-solving, clear thinking, precise communication, and decision-making within an evolving technological context. Ultimately, it argues that those with a deep understanding of foundational principles are vital to developing effective software solutions, suggesting that reports about the end of programming might be overstated. Keywords: #phi4, 4GLs, AI tools, CASE tools, COBOL, Software history, automation, automation Keywords: Software history, expert systems, hype cycles, large language models, no-code platforms, programming elimination, software development
    The google logo   www.ivanturkovic.com 14 days ago
   https://www.encyclopedia.com/humanities/dictionaries-th   11 days ago
   https://www.merriam-webster.com/dictionary/democratic#:   11 days ago
   https://archive.org/details/applicationdevel00mart   11 days ago
   https://theconversation.com/the-reinhart-rogoff-error-or-how   10 days ago
   https://www.galacticbeyond.com/a-bridge-to-everywhere/   10 days ago
   https://archive.fosdem.org/2025/schedule/track   10 days ago
   https://en.wikipedia.org/wiki/The_Last_One_(software)   10 days ago
   https://www.bbc.com/news/technology-54423988   10 days ago
   https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#   10 days ago
   https://en.wikipedia.org/wiki/Snail_on_the_Slope   10 days ago
   https://www.ivanturkovic.com/2026/01/22/histo   10 days ago