87.
HN
Show HN: AgentDX – Open-source linter and LLM benchmark for MCP servers
AgentDX is an open-source tool developed to evaluate and enhance the performance of Multi-Context Protocol (MCP) servers by addressing common issues such as unclear tool descriptions, incomplete schemas, and ambiguous naming conventions that can impede interactions with Large Language Models (LLMs). The tool comprises two principal commands: **Lint** and **Bench**. The Lint command conducts static analysis on MCP server components using 18 predefined rules without requiring an LLM or configuration, yielding a lint score to highlight potential problems. Meanwhile, the Bench command assesses how effectively LLMs can interact with the server by evaluating tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery capabilities. This evaluation results in an Agent DX Score ranging from 0 to 100, reflecting the server's usability for AI agents.
AgentDX streamlines the process of detecting server entry points, functioning as an MCP client, and automatically generating test scenarios. It is developed in TypeScript under the MIT license and is currently in its early alpha phase, with future plans to enhance speed through parallelization techniques. The tool supports various LLM providers, including Anthropic, OpenAI, and Ollama, and can be integrated into Continuous Integration (CI) workflows using GitHub Actions. Additionally, it offers configuration options for customization and encourages community contributions, providing comprehensive documentation on its technical specifications, architecture, and future development roadmap.
Keywords: #phi4, Agent DX Score, AgentDX, Anthropic, CI integration, CLI, GitHub Code Scanning, LLM benchmark, MCP servers, MIT license, Ollama, OpenAI, TypeScript, concurrency, configuration, error handling, lint score, linter, naming conventions, scenarios, schemas, static analysis, tool descriptions
github.com 6 hours ago
|
124.
HN
Show HN: Experience-engine – reflection-based memory layer for local LLMs
The "Experience-Engine" is an innovative memory layer designed to augment local Large Language Models (LLMs) by enabling them to leverage past interactions rather than initiating each conversation anew, thus addressing a fundamental limitation in AI systems' contextual awareness and personalized response capabilities. It features a two-layer pipeline: the first layer processes user interactions into domain-specific beliefs (V1), while the second synthesizes these beliefs into cognitive patterns (V2) that inform contextually aware responses. This system is designed for easy installation with Python 3.10+ and supports Ollama as an LLM option without additional dependencies.
The engine's functionality extends to logging interactions, extracting domain beliefs, synthesizing insights into cognitive patterns, formatting these insights into prompts for enhanced AI interaction, and applying learned patterns to new scenarios. It generates outputs in two forms: V1, which includes domain-specific knowledge, and V2, encompassing broader cognitive patterns like decision archetypes and user goal tensions. These capabilities allow the engine to improve AI responses by making them aware of past interactions and user-specific cognitive tendencies, thus providing more personalized advice that aligns with individual preferences such as "control-first" architecture or deterministic progression biases.
The Experience-Engine offers customizable configuration options through a configuration object or environment variables. It also supports interactive Command-Line Interface (CLI) tools for logging, reflecting, synthesizing, and displaying data, with flexibility to integrate other LLMs by using custom callables beyond Ollama. Future developments in the roadmap include implementing confidence decay for patterns, tracking AI advice outcomes, resolving cognitive tensions, detecting shifts in decision archetypes over time, and adding adapters for OpenAI and Anthropic models. Released under the MIT license, the Experience-Engine is poised to significantly enhance the contextual awareness and personalization of AI interactions.
Keywords: #phi4, CLI, Experience-engine, LLMs, Ollama, Python, cognitive patterns, confidence decay, domain beliefs, interaction log, local storage, memory layer, outcome tracking, reflection-based
github.com 9 hours ago
|
222.
HN
Show HN: KrillClaw – 49KB AI agent runtime in Zig for $3 microcontrollers
KrillClaw is an innovative AI coding agent developed in Zig, specifically designed to operate on $3 microcontrollers within a compact 150-180 KB footprint, making it the world’s smallest autonomous coding agent. It features zero dependencies and minimal resource requirements, allowing seamless integration with various language models like Claude, OpenAI, or Ollama for autonomous tool execution. KrillClaw supports multiple runtime environments including BLE/Embedded systems through three transport layers: HTTP, BLE, and Serial. Its design includes different profiles—coding, IoT, and robotics—with compile-time profile selection to ensure zero runtime overhead and tailored tools such as bash execution, file operations, and search functionalities suited for specific applications.
To get started with KrillClaw, users need Zig 0.13+ installed from the official website, followed by building KrillClaw using `zig build -Doptimize=ReleaseSmall`. Integration requires setting up an API key (e.g., ANTHROPIC_API_KEY) to connect with AI models and allows interactive or one-shot command operations. The coding profile caters to general coding tasks, while the IoT profile is designed for applications like MQTT and HTTP requests, and the robotics profile includes safety features such as e-stop commands.
Security considerations highlight that KrillClaw should not be run with elevated privileges due to potential security risks, especially since BLE and Serial transports currently lack encryption/authentication. It operates in trusted environments only. Architecturally, KrillClaw boasts custom components like a hand-rolled JSON parser for efficiency, vtable-based transport layers for communication protocol flexibility, and a fixed-size arena allocator to manage memory effectively on embedded targets.
Despite its strengths, KrillClaw has limitations such as a flat JSON parser design and heuristic token estimation. It also intentionally avoids regex support in its search tool to maintain a minimal footprint. Future enhancements may address issues like conversation persistence and cross-platform serial configuration. Licensed under BSL 1.1 with a transition to Apache 2.0 after four years, KrillClaw exemplifies the potential of integrating AI capabilities into highly efficient packages for low-resource environments, advancing microcontroller-based applications significantly.
Keywords: #phi4, AI agent, BLE, Claude, FNV-1a loop detection, IoT, JSON parser, KrillClaw, Ollama, OpenAI, REPL commands, Zig, arena allocator, autonomous, embedded, microcontrollers, priority-based truncation, robotics, sandbox, security, smart ring, vtable transports
github.com a day ago
https://krillclaw.com a day ago
|
223.
HN
Show HN: Persistent memory for Claude Code with self-hosted Qdrant and Ollama
The document outlines a self-hosted server solution designed to provide persistent memory for Claude Code through integration with tools like Qdrant, Ollama, and optionally Neo4j. At its core, the solution leverages mem0ai as a library to facilitate the storage, searching, and management of memories across sessions, enhancing Claude Code's ability to remember past interactions. The infrastructure comprises Qdrant for vector storage, Ollama for embedding generation, and Neo4j, which can optionally be used to construct a knowledge graph.
Authentication is streamlined by automatically configuring with Claude Code's OAT token from local credentials, simplifying user access. In terms of Large Language Model (LLM) operations, the system supports various models that cater to different needs: free or locally hosted Ollama, the affordable Gemini 2.5 Flash Lite, and a split-model strategy which combines multiple LLMs for improved accuracy in complex tasks such as entity extraction and contradiction detection.
Installation of this server solution is facilitated through uvx, with environment variables managing configurations. It can be seamlessly integrated into projects by updating configuration files or global settings, making it adaptable to different project needs. By leveraging modern LLMs and persistent memory technologies, the server aims to boost productivity by enabling Claude Code to effectively utilize past interactions across sessions. The entire project is open-source and distributed under the MIT license, encouraging community collaboration and innovation.
Keywords: #phi4, Anthropic API, Claude Code, MCP server, Neo4j, Ollama, Persistent memory, Python, Qdrant, authentication, embeddings, knowledge graph, mem0ai, telemetry, vector storage
github.com a day ago
|
248.
HN
I wasn't satisfied with existing cloud coding agents, so I built my own
Netclode is an innovative self-hosted cloud coding agent designed to provide developers with greater control over their coding environment through customizable features. It employs microVM sandboxes utilizing Kata Containers and Cloud Hypervisor to ensure security and isolation while allowing full root access for Docker operations, balancing functionality with robust protection. Notable advantages include local inference via Ollama models, network management integration with Tailscale, efficient session handling using JuiceFS storage offloaded to S3, and a seamless user experience through iOS and macOS applications. Supporting multiple SDKs such as Claude Code, OpenCode, and Copilot from Anthropic, OpenAI, and Mistral, Netclode is adaptable to various development needs.
The architecture of Netclode consists of a control plane hosted on a VPS with orchestration and session management conducted by k3s, while Redis maintains real-time state. The setup prioritizes simplicity and efficiency, utilizing Ansible for provisioning and Tailscale for secure VPN connections. Its project components include a TypeScript-based agent runner, a Go-based secret proxy, and protobuf definitions to handle APIs effectively.
Netclode stands out as a robust and cost-effective solution offering features like instant VM start from a warm pool, session pause/resume capabilities, GitHub integration, and CLI access for managing sandboxes. These attributes collectively enhance productivity and flexibility, making Netclode an attractive option for developers seeking advanced cloud coding environments.
Keywords: #phi4, Ansible, CLI, Connect RPC, Docker, GPU, GitHub integration, Go, JuiceFS, Kata VM, Kubernetes, Netcode, Nodejs, Ollama, Protobuf, Redis, S3 storage, SDKs, Swift, Tailnet integration, Tailscale VPN, TypeScript, coding agent, control plane, gRPC, iOS, local inference, macOS, microVM, nested virtualization, provisioning, root access, sandbox shell, sandboxes, secrets proxy, self-hosted, session history
github.com a day ago
|
393.
HN
Show HN: M-Courtyard – Fine-tune LLMs on your Mac with zero code
M-Courtyard is a desktop application tailored for fine-tuning Large Language Models (LLMs) on macOS devices, specifically targeting those equipped with Apple Silicon chips. The app streamlines the process by eliminating coding requirements and providing an intuitive four-step user interface that guides users from inputting raw documents to deploying a fine-tuned model using Ollama. Its key features include AI-driven dataset generation, efficient training with mlx-lm supported by real-time visualizations, and straightforward export of models. The application emphasizes local operation, ensuring privacy without reliance on cloud services.
Constructed using Tauri 2.x, React, and mlx-lm, M-Courtyard supports multiple languages and offers a user-friendly experience through guided workflows and mechanisms to prevent sleep during tasks. It addresses common issues found in traditional fine-tuning tools that often depend heavily on command-line interfaces or require extensive scripting. Users can import various document formats, create training datasets via AI or rule-based methods, customize model training parameters, interactively test model quality, and export the finalized model in different quantization formats directly to Ollama.
The application is licensed under AGPL 3.0 and encourages user feedback for potential feature enhancements. It is available as a pre-built app for macOS 14+ users with Apple Silicon processors, along with comprehensive documentation and support through community platforms like Discord and GitHub.
Keywords: #phi4, AGPL 30, AI dataset generation, Apple Silicon, CLI tools, GPU acceleration, GUI, HuggingFace, LLMs, LoRA parameters, M-Courtyard, Mac, ModelScope, Ollama, Python, React, Rust, SQLite, Tauri, Tauri IPC, UX design, commercial license, community supportKeywords: M-Courtyard, data preparation, data privacy, desktop app, documentation, export, fine-tuning, i18n, internationalization, local processing, macOS, mlx-lm, model training, quantization, sleep prevention
github.com a day ago
|
444.
HN
Show HN: NadirClaw – Open-source LLM router with 10ms classification
NadirClaw is an open-source tool designed to optimize the routing of AI prompts between various models based on their complexity, functioning as a proxy for OpenAI-compatible APIs. It efficiently classifies and directs simple prompts to cost-effective local or free models while channeling complex prompts to premium models in approximately 10 milliseconds per prompt. Key features include Smart Routing, which uses sentence embeddings to categorize prompts; Agentic Task Detection, which routes tasks requiring advanced capabilities like multi-step loops to suitable models; Reasoning Detection for handling reasoning-intensive prompts; Session Persistence for maintaining model consistency within ongoing conversations; Context Window Management to switch to larger context models when necessary; and Rate Limit Fallback for seamless transitions if rate limits are encountered. NadirClaw supports easy installation through pip or a GitHub script, with configuration options for API keys, model selection based on prompt complexity, and telemetry via OpenTelemetry for distributed tracing. Compatible with multiple AI providers such as Google Gemini and Anthropic Claude, it integrates seamlessly into existing tools using the OpenAI API and offers configurable routing profiles to balance cost against quality. The project is structured with components like a CLI, server setup, classifiers, and credential management, all under an MIT license that allows free modification and distribution. NadirClaw stands out as a flexible, efficient solution for managing AI model interactions tailored to prompt complexity needs.
Keywords: #phi4, API endpoints, API endpoints Comma-separated Keywords: NadirClaw, API endpoints Extracted Keywords: NadirClaw, API endpoints Final Comma-separated List: NadirClaw, API endpoints Final Keywords: NadirClaw, API endpoints Final List: NadirClaw, API endpoints Keywords: NadirClaw, API endpoints NadirClaw, API endpoints Selected Keywords: NadirClaw, API endpoints Simplified Keywords: NadirClaw, CLI reference, Claude Code, Gemini Flash, LLM router, NadirClaw, OAuth login, Ollama, Open-source, OpenAI API, OpenTelemetry tracing, Python 310+, agentic task detection, classification, configuration, context-window filtering, installation, model aliases, proxies, rate limit fallback, reasoning detection, reasoning tasks, routing profiles, sentence embeddings, session persistence, streaming support
github.com 2 days ago
|
446.
HN
AgentDocks – open-source GUI for AI agents that work on your real codebase
AgentDocks is an open-source graphical user interface (GUI) designed to integrate AI agents seamlessly into existing codebases. It simplifies the onboarding process with a straightforward five-step setup that includes welcoming users, configuring API keys, and selecting a sandbox environment. The platform offers a chat-like UI for intuitive interaction with AI agents and supports multiple providers such as Anthropic, OpenRouter, and Ollama. AgentDocks ensures data privacy through flexible sandbox environments, allowing operation in either cloud-based E2B or local Docker containers.
The platform is characterized by its user-friendly features, including a familiar chat interface, compatibility with various AI providers, and the ability to maintain a local-first data policy to keep data on the user's machine. Additionally, it provides real-time streaming capabilities, enabling users to observe AI agents at work step-by-step. A distinctive aspect of AgentDocks is its custom agent engine that operates without external dependencies.
Built using modern technologies, the frontend leverages Next.js, React, Tailwind CSS, and TypeScript for styling and type safety, while the backend utilizes FastAPI with Anthropic SDK integration and Docker SDK for managing sandboxes. The cloud-based E2B offers rapid execution with security benefits, whereas Docker provides a local containerized environment for secure code execution.
AgentDocks is accessible through various installation methods including a one-liner script, Docker with `docker-compose`, or manual setup requiring Node.js, Python, and Docker. Its API endpoints facilitate saving configurations, running agent tasks, and checking health status, while SSE streams provide insights into tool usage and results during task execution.
For development and deployment, AgentDocks offers comprehensive tools for linting, testing, and building Docker images. The frontend can be deployed on platforms like Vercel, and the backend on Railway or Fly.io. The open-source nature of AgentDocks invites contributions through bug reports, feature suggestions, documentation enhancements, and code improvements under the MIT license. Overall, AgentDocks is a robust, privacy-centric platform designed to streamline AI agent integration with ease of use and customization options.
Keywords: #phi4, AI agents, API endpoints, AgentDocks, Anthropic, Docker, E2B, FastAPI, GUI, HTTP client, MIT license, Nextjs, Ollama, OpenRouter, Python, SSE events, TypeScript, bug reports, chat interface, cloud execution, code contributions, code contributions Keywords: AgentDocks, codebase, contributing, deployment, development commands, documentation improvements, feature requests, local containers, onboarding, sandbox, streaming, uninstallation
github.com 2 days ago
|
470.
HN
Show HN: DroidClaw – Turn old Android phones into AI agents
DroidClaw is an open-source tool designed to convert outdated Android devices into AI-powered agents capable of performing a range of tasks through natural language instructions. The core functionality relies on interacting with the device's UI using its accessibility tree, processed by a Language Model (LLM) and executed via ADB (Android Debug Bridge). This setup allows DroidClaw to handle both AI-driven workflows for dynamic task execution and deterministic sequences for fixed operations. Notable features include a fallback vision mode that activates when the accessibility tree is inaccessible, stuck detection mechanisms that trigger recovery actions if no change occurs after three steps, and support for dual modes of operation—either AI-based or predefined action sequences.
DroidClaw extends its functionality with remote control capabilities over WiFi and Tailscale, enabling users to manage their devices from anywhere. It supports integration with multiple AI models such as Groq, OpenAI, OpenRouter, Bedrock, and Ollama for local inference tasks. Installation is straightforward, requiring just a single command line input. The tool's versatility makes it suitable for various applications, including messaging, social interactions, productivity, research, and lifestyle management. By leveraging the phone's built-in apps as tools, DroidClaw transforms old smartphones into always-on agents that can interact with other applications without needing API keys.
Keywords: #phi4, ADB, AI agents, Android phones, Bun, DroidClaw, Groq, LLM, Ollama, OpenAI, Slack, Tailscale, Telegram channels, TypeScript, WhatsApp, WiFi control, accessibility tree, always-on agents, cron job, execution modes, install script, on-device AI apps, remote agent, remote control, stuck detection, uiautomator, vision fallback, workflows
droidclaw.ai 2 days ago
|
480.
HN
Show HN: PolyClaw – An Autonomous Docker-First MCP Agent for PolyMCP
PolyClaw is an advanced Docker-first autonomous agent designed for the PolyMCP ecosystem, building upon and extending the capabilities of its predecessor, OpenClaw. It distinguishes itself by not only executing tools but also dynamically planning, executing, and adapting workflows to handle intricate tasks across various contexts in production environments. A standout feature of PolyClaw is its ability to autonomously create and manage Multi-Contextual Processing (MCP) servers as required. The key functionalities include dynamic task planning that decomposes complex activities, tool orchestration that adapts to contextual shifts or failures, and infrastructure management that ensures both flexibility and resilience by dynamically setting up necessary resources. With integration into Docker environments, PolyClaw guarantees safety and isolation during operations.
Developed using Python and TypeScript, PolyClaw can be launched through the PolyMCP CLI. Unlike typical AI agents, it autonomously constructs its required infrastructure, adapts to failures with strategic planning, and operates securely within containerized settings. These capabilities make PolyClaw an ideal solution for enterprise workflows, DevOps automation, data pipelines, internal tool orchestration, and complex reasoning tasks involving multiple tools. It transforms the PolyMCP ecosystem from a simple tool interface into a robust autonomous orchestration agent, enhancing its functionality significantly. The source code for PolyClaw is publicly accessible on GitHub at [PolyMCP](https://github.com/poly-mcp/PolyMCP).
Keywords: #phi4, CLI, DevOps automation, Docker-first, MCP tools, Ollama, PolyClaw, PolyMCP, Python, TypeScript, adaptive planning, autonomous agent, containerized, data pipelines, enterprise workflows, infrastructure-aware, isolated, multi-step tasks, orchestration, tooling orchestration
news.ycombinator.com 2 days ago
|
540.
HN
An AI CVE scanner that adjusts CVSS scores based on actual code usage
The Contextual CVE Engine is an advanced AI-powered vulnerability scanner that enhances traditional scanning methods by delivering context-specific risk assessments within a codebase. It addresses issues such as the irrelevance of generic CVSS scores to particular projects, alerts for unused dependencies, and security teams' time wasted on false positives. By recalculating CVSS scores using real-world usage data via AI analysis with OpenCode, it tailors vulnerability evaluations precisely to the project's context, highlighting true exploitability. The solution automatically identifies dependencies and assesses their vulnerabilities, producing actionable reports that focus only on relevant issues.
Key features include AI-driven code context analysis, automatic dependency detection, and a streamlined process by consolidating various analyses into a single AI call. Usage scenarios for this tool involve daily monitoring through automated scans, targeted scanning of specific technologies or critical vulnerabilities, and integration within CI/CD pipelines to maintain security compliance during deployment. Installation requires setting up OpenCode for AI analysis, with detailed instructions available on their website; users can clone the repository, install via pip, and execute commands like `cve-scanner scan` for customization options such as keyword filtering and output specification.
The tool also supports local AI processing through Ollama, offering enhanced privacy or offline capabilities. While streamlining vulnerability management by providing precise, context-aware security assessments, it still recommends manual reviews for critical systems. Contributions to this project are permitted under the MIT License, ensuring broad usability and adaptability across different development environments.
Keywords: #phi4, AI, CI/CD integration, CVE scanner, CVSS scores, Contextual CVE Engine, MIT License, NVD, Ollama, OpenCode, actionable reports, codebase analysis, dependency detection, exploitability assessment, real-world risk, vulnerability scanner
github.com 2 days ago
|
618.
HN
Vox – Local Voice AI Framework in Rust (STT and TTS and VAD)
Vox is a comprehensive local-first voice AI framework developed using Rust, aimed at providing speech-to-text (STT), text-to-speech (TTS), and voice chat functionalities without dependency on cloud services or API keys, ensuring data privacy by processing all operations locally on the user's machine. It features core components such as Voice Activity Detection (VAD) with Silero models, Whisper for STT offering various model sizes to optimize speed and accuracy, and TTS options including Kokoro, Pocket, and Chatterbox for diverse voice generation needs. The framework emphasizes local processing to ensure that no data leaves the user's device, supports pluggable architecture allowing users to swap VAD, STT, or TTS engines using traits, and offers cross-platform compatibility with macOS (Intel and Apple Silicon), Linux, and Windows.
Vox can be installed via Cargo for command-line utilities or server functionalities, supporting commands like `vox listen` for transcription, `vox speak` for TTS, and `vox chat` for voice chatting with LLMs. Models are auto-downloaded upon first use, with an option to skip download prompts. Users can leverage a web interface using `vox serve`, which provides real-time transcription and synthesis capabilities through a browser UI, along with an HTTP API that supports both REST and WebSocket protocols for system integration.
The project encourages contributions, providing guidelines on setting up development environments, creating feature branches, running tests, and submitting pull requests. Developed in Rust with PyO3 bindings for Python script functionality, Vox ensures low latency and efficient memory usage in its VAD and STT processes. Available under the MIT or Apache-2.0 license, it promotes open-source use and modification, offering model flexibility based on user requirements and supporting a range of applications through its robust and adaptable architecture.
Keywords: #phi4, CLI, Cargo, Contributing, Examples, Feature Flags, Framework, HTTP API, Kokoro, License, Local Voice AI, Models, Ollama, Performance, Platform Support, PyO3, Rust, Silero, Speech-to-Text (STT), Text-to-Speech (TTS), Voice Activity Detection (VAD), Vox, WebSocket, Whisper
github.com 2 days ago
|
628.
HN
Show HN: AgentKV – SQLite for AI agent memory (MMAP vector+graph DB)
AgentKV is a versatile, embeddable vector and graph database tailored for AI agents, offering a local solution that parallels SQLite but with enhanced functionalities. It supports efficient vector search through HNSW indexing and manages complex graph relationships. A key feature includes crash recovery facilitated by CRC-32 checksums, ensuring data integrity, while allowing thread-safe concurrent reads without the need for additional servers or configuration files. Developed in C++20, it provides Python (version 3.9+) access via nanobind bindings, achieving competitive throughput with built-in persistence when compared to FAISS. The installation process is user-friendly, leveraging pip: `pip install agentkv`. It is equipped to handle real-world applications such as local retrieval-augmented generation (RAG) implementations and memory-enhanced multi-turn chatbots, thus enabling AI agents to coordinate efficiently using context graphs. Designed for ease of use, AgentKV allows users to store conversation histories and related documents without requiring additional server infrastructure. The project encourages feedback on its API design and potential applications. For practical usage, the database can be initialized and used as shown in the example where it stores a statement about Paris with an associated random vector and retrieves it based on a query vector. More information or access to download is available through the PyPI project page.
Keywords: #phi4, AI agent memory, AgentKV, C++20, CRC-32 checksums, FAISS, HNSW index, MMAP, Ollama, PyPI, Python bindings, RAG documents, SQLite, benchmarked, chatbot, concurrent reads, context graphs, crash recovery, graph database, nanobind, persistence, pip install, thread-safe, vector database
github.com 3 days ago
|
642.
HN
I made a real BMO local AI agent with a Raspberry Pi and Ollama
The content outlines the development of a local AI agent called BMO, constructed using a Raspberry Pi in conjunction with Ollama, and presented via a YouTube video. This project combines hardware and software elements to create an intelligent system accessible through a popular platform. The accompanying information on YouTube includes standard elements such as copyright details, contact options, and policy statements. Additionally, there is a mention of NFL Sunday Ticket being considered under Google LLC's future plans, suggesting potential integration or promotional strategies involving digital broadcasting rights within the context of technological advancements like BMO.
Keywords: #phi4, AI, Advertise, BMO, Contact, Copyright, Creators, Developers, Google LLC, NFL Sunday Ticket, Ollama, Press, Privacy Policy, Raspberry Pi, Safety, Terms, YouTube
www.youtube.com 3 days ago
|
652.
HN
Where Does Ollama run glm-5:cloud Run? And other Security Blunders
Ollama provides cloud-based services enabling users to operate large AI models without requiring high-end GPUs by leveraging its cloud infrastructure. Users access these models via an account on ollama.com, where supported models are detailed in Ollama's model library. To utilize a specific model, commands such as `ollama pull gpt-oss:120b-cloud` are employed to retrieve it from the cloud. Interaction with these models is streamlined through libraries available for Python and JavaScript; users can install the Python library via `pip`, utilizing the Client class in their scripts, while JavaScript users can do so using npm to access the Ollama object. Additionally, cURL commands facilitate command-line interactions either on localhost or directly through ollama.com's API.
For direct cloud model access via the API, an API key from ollama.com is necessary, which must be configured as an environment variable (`OLLAMA_API_KEY`). This setup allows users to list models and generate responses using cURL with proper authorization headers. By offering this service, Ollama presents a flexible solution for executing large AI tasks without the need to enhance local hardware capabilities, catering to a broad range of computational needs.
Keywords: #phi4, API, CLI, GPU, JavaScript, OLLAMA_API_KEY, Ollama, Python, account, authorization, cURL, chat, cloud models, environment variable, headers, host, install, larger models, library, local tools, offload, ollamacom, pull, request, response, run, stream, tags, tokens
docs.ollama.com 3 days ago
|
656.
HN
Show HN: Deadend CLI – Open-source self-hosted agentic pentesting tool
Deadend CLI is an open-source tool developed for autonomous penetration testing of web applications, focusing on automating vulnerability research to minimize repetitive tasks and enable deeper analysis of vulnerabilities in complex scenarios. Demonstrated a 78% success rate on XBOW benchmarks through Claude-sonnet-4.5 in a blackbox setting, it employs a local execution model supported by Docker isolation via Playwright and WebAssembly. Key features include CI/CD integrations, code review capabilities, bash completion, OWASP Top 10 plugins, and support for MacOS Arm64 and Linux 64-bit systems.
The tool is designed to be model-agnostic, integrating various large language models (LLMs) such as Claude Sonnet and Kimi K2. Deadend CLI operates on a feedback-driven iterative architecture using a supervisor-subagent hierarchy that focuses on refining exploitation strategies through confidence-based decision-making. It excels at identifying XSS, business logic vulnerabilities, SQL injection, GraphQL, and SSRF.
Supporting multiple providers like OpenAI, Anthropic, and Ollama via LiteLLM, Deadend CLI configuration involves a JSON file for model details and API keys, with CLI preferences stored separately. Its technology stack includes Deno for the CLI runtime, React for UI, and Docker for command isolation. Currently in stable version 0.1.0, future enhancements include codebase analysis support, workflow automation, context optimization, high performance with open-source models, hybrid testing integration, adversarial robustness improvement, and orchestration of multi-target tests.
The project is actively developed, inviting contributions in areas such as context optimization and vulnerability test cases. Users are encouraged to provide feedback or collaborate through its GitHub repository or Discord server, with the tool intended solely for authorized security testing where users are responsible for legal compliance.
Keywords: #phi4, AI reasoning, Anthropic, CI/CD integrations, CLI tooling, Deadend CLI, Deno runtime, Discord server, Docker, Docker isolation, GitHub Repo, Linux 64bits, LiteLLM, MacOS Arm64, OWASP Top 10, Ollama, OpenAI, Playwright, RAG operations, React UI, WASM, agent architecture, autonomous, benchmarks, custom payloads, feedback-driven iteration, local execution, model-agnostic, penetration testing, pentesting, sandboxed tools, security analysis, shell commands, source/sink detection, taint analysis, vector search, vulnerability research, webapps
github.com 3 days ago
|
673.
HN
OMLX – Ollama for MLX (LLM Inference Server for Apple Silicon)
oMLX is an inference server tailored for Apple Silicon Macs, designed to optimize the operation of large language models (LLMs) by offering enhanced user control and convenience. It features continuous batching, infinite SSD caching, and management through a macOS menu bar application that eliminates the need for terminal commands. The system allows users to keep frequently used models in memory while auto-swapping heavier models as required, set context limits, and maintain a persistent cache across sessions. Installation is simplified via a downloadable macOS app or from source using Git, with support for Python 3.10+ on Apple Silicon devices.
oMLX's architecture includes a FastAPI server connected to engines responsible for model execution, batch processing, embedding, and reranking, supported by GPU memory and SSD tiered caching. Its key features include SSD-tiered paged caching, multi-model serving with LRU eviction policy, Claude Code optimization for context scaling, API compatibility with OpenAI and Anthropic standards, tool calling capabilities, and structured output support. The platform supports a variety of LLMs that can be configured through CLI or a web-based admin panel.
The server offers an administrative dashboard providing real-time monitoring and model management options, including built-in downloading from HuggingFace. Additionally, the project encourages community contributions to its development and is licensed under Apache 2.0.
Keywords: #phi4, Anthropic, Anthropic API, Apple Silicon, CLI, CLI Configuration Keywords: OMLX, FastAPI, FastAPI Server, GPU, GPU memory, LLM, LLM inference, OMLX, Python, SSD, SSD caching, batching, macOS, menu bar, multi-model, multi-model serving
github.com 3 days ago
|
704.
HN
Run OpenClaw for Free on GeForce RTX and Nvidia RTX GPUs and DGX Spark
OpenClaw is a locally hosted AI assistant designed for personal use that manages schedules, emails, projects, and research by utilizing user context from files and applications. It leverages Large Language Models (LLMs) to improve its functionalities and can be hosted either on local hardware or in the cloud; however, local hosting is preferred to maintain privacy and minimize costs associated with continuous cloud usage. The guide outlines how to optimize OpenClaw's performance and data security by running it on NVIDIA RTX GPUs and DGX Spark systems. NVIDIA RTX GPUs are ideal due to their Tensor Cores and CUDA support, which accelerate the AI operations required for tools like Ollama and Llama.cpp. Meanwhile, DGX Spark is well-suited for its significant memory capacity of 128GB and continuous operation capabilities, enabling users to run larger models with improved accuracy while keeping data private and avoiding cloud service fees.
Keywords: #phi4, AI Agent, CUDA, DGX Spark, GeForce RTX, Large Language Models (LLMs), Llamacpp, Nvidia RTX GPUs, Ollama, OpenClaw, Tensor Cores, always-on, cloud LLMs, data security, local-first, performance, personal secretary, privacy, project management, research agent
www.nvidia.com 3 days ago
|
807.
HN
Meeting-Assistant, Local meeting notes assistant and AI analysis in C++
Meeting-Assistant is a high-performance terminal application designed to transform spoken conversations into structured knowledge through real-time local transcription and deep AI analysis. It produces professional reports, visual mind maps, and role-specific insights without the need for manual note-taking. The application supports offline functionality using whisper.cpp and offers flexible AI intelligence through cloud models or local instances like Ollama, catering to various professional roles such as project managers (PMs) and developers.
Key features of Meeting-Assistant include active intelligence with live querying capabilities, contextual continuity in transcription accuracy, visual mapping via Mermaid.js diagrams, and seamless integration with platforms like Obsidian. Installation prerequisites include CMake and PortAudio, along with a Whisper model for speech-to-text functionality. Real-world applications of the tool are demonstrated through its use in daily standups by PMs to focus on blockers or technical architecture reviews by developers that emphasize complex logic.
Meeting-Assistant ensures privacy by supporting offline meetings that run entirely on local hardware when needed and is configured via a JSON file. Additionally, it emphasizes user-friendly dashboard hotkeys to streamline meeting management, enhancing the overall efficiency of the tool for professional use.
Keywords: #phi4, AI analysis, C++, GitHub/GitLab, Meeting Assistant, Mermaidjs, Obsidian, Ollama, PortAudio, Whisper, cloud models, cmake, cognitive load, configuration, dashboards, hotkeys, installation, integration, live AI copilot, local machine, offline, privacy, professional role, real-time, reports, second brain, semantic callouts, standalone HTML Keywords: Meeting Assistant, terminal application, transcription, visual mapping
github.com 4 days ago
|
849.
HN
AI-Powered Knowledge Graphs for Cyber Threat Analysis
AI-Powered Knowledge Graphs (AIKG) for Cyber Threat Analysis are designed to transform unstructured text into interactive visualizations using LLM and SPO triplet extraction techniques, facilitating deeper insights into complex data sets. Developed by Robert McDermott, AIKG processes extensive documents by breaking them down into manageable parts, consistently identifying entities and their relationships, thereby creating an interactive graph visualization. The system is compatible with any OpenAI-compatible API endpoint and was specifically tested using Ollama's Gemma 3 model.
To implement AIKG, one must set up a Python virtual environment and acquire the necessary AI models through Ollama. This tool excels in extracting semantic triples (SPO triplets) from documents, which is particularly beneficial for visual link analysis—a key process for security professionals such as threat hunters. The efficacy of this system was demonstrated through experiments analyzing articles on Russian state-sponsored cyber activities, where it successfully generated nodes and edges that mapped out relationships like specific threats targeting entities.
Two critical experiments using the Gemma 3 model with different parameter configurations (12 billion and 27 billion) highlighted AIKG's ability to depict complex interactions within dense texts. These tests revealed intricate connections between threat actors, targets, exploitation methods, and infrastructure components. The resulting graphs serve as valuable tools for cyberthreat intelligence analysts by providing enriched context that aids in report writing.
AIKG proves its worth by converting text into structured knowledge representations, thereby enhancing situational awareness in cybersecurity contexts. Its potential applications extend beyond cyber threat analysis to improving context generation practices across various fields through machine learning collaboration.
Keywords: #phi4, AI-Powered Knowledge Graphs, AIKG, APT Campaigns, Beagle, CISA Advisory, Cyber Threat Analysis, Cybersecurity, Gemma 3, GraphFrames, Graphviz, IOCs, Interactive Visualization, Knowledge Graph Generation, LLM, Machine Learning, Maltego, Ollama, OpenAI-compatible API, Python3, Robert McDermott, SPO Triplets, Semantic Triples, TTPs, TTPsKeywords: AI-Powered Knowledge Graphs, Threat Intelligence, Unstructured Text, Virtual Environment, Visual Link Analysis
isc.sans.edu 4 days ago
|
988.
HN
Show HN: PreApply – Terraform plan analyzer with blast radius and risk scoring
PreApply is a deterministic tool designed for analyzing Terraform plans, focusing on assessing the risk and potential impact of planned infrastructure changes prior to application. Its primary objective is to help users avoid costly errors during deployment through comprehensive risk assessments that highlight possible issues using structured metrics. This is achieved by offering features such as Blast Radius Analysis, Risk Scoring, Dependency Mapping, and deterministic results which ensure decisions are both traceable and explainable.
The key functionalities of PreApply include analyzing Terraform plans to identify potential risks, recommending strategies for mitigating these risks by reviewing resource modifications in stages, and providing multiple output formats like human-readable text and JSON. These formats facilitate integration with Continuous Integration/Continuous Deployment (CI/CD) systems such as GitHub Actions, GitLab CI, and Jenkins.
One of the main advantages of PreApply is its deterministic nature, which ensures consistent results without relying on AI-based risk detection tools that may yield variable or unexplainable outcomes. Additionally, it supports local AI advisors through Ollama for optional explanations, while maintaining privacy since all operations are performed offline. The installation process is streamlined via pip with optional AI support, and users can generate a Terraform plan JSON file to be analyzed by PreApply. Results can be saved and further insights provided by the AI advisor if desired.
PreApply is developed as an open-source project under the Apache License 2.0, encouraging contributions from the community to improve Terraform resource handlers, CI/CD integrations, documentation, and test coverage. The tool aims to prevent deployment mishaps by ensuring users fully understand the implications of their plans before proceeding with changes.
Keywords: #phi4, AI advisor, Apache License 20, CI/CD integration, CoreOutput schema, GitHub Actions, GitLab CI, Jenkins, Ollama, PreApply, Python 38+, Terraform, blast radius, dependency mapping, deterministic analysis, development mode, infrastructure relationships, plan analyzer, risk assessment, risk scoring
github.com 5 days ago
|
1025.
HN
A Python terminal deep-space receiver
The "6EQUJ5" project is a Python terminal-based simulation designed to immerse users in deep-space signal reception and first contact scenarios, simulating the experience of tuning into the hydrogen line and decoding signals from hypothetical extraterrestrial civilizations. This interactive software offers an engaging fictional setup reminiscent of 1970s control rooms while using real astronomical coordinates for narrative depth. Users interact with the simulation through commands such as scanning anomalies, contacting specific civilizations by catalog ID or celestial coordinates, decoding signals, and encoding messages. The project encourages reflection on humanity's desired representation to other intelligent life forms. Installation involves cloning a GitHub repository and installing dependencies via pip, with an advanced AI mode available for enhanced interaction using tools like ollama and qwen3:8b. The simulation is structured with clear session flows for scanning, contacting civilizations, and comparing their attributes, supported by comprehensive command references to facilitate ease of use. By blending technical elements with speculative fiction, 6EQUJ5 explores human responses to potential extraterrestrial contact.
Keywords: #phi4, 6EQUJ5, AI-assisted, Ollama, Python, Qwen3:8b, RA/DEC coordinates, anomalies, astronomical, civilizations, contact, control-room feel, decode, deep-space, dialogue, encode, first contact, hydrogen line, pytest, receiver, signal detection, signals, structured pattern, terminal
github.com 5 days ago
|
1063.
HN
Show HN: ZkzkAgent – a self-hosted AI assistant for Linux
**ZkzkAgent** is an advanced open-source AI assistant tailored for Linux users, emphasizing privacy through local processing without reliance on cloud services. The tool facilitates system management via natural language commands while ensuring data security by keeping all operations and models on the user's device. Its functionalities include intelligent file searching, process and service handling, automatic internet reconnection, and optional voice interaction using Whisper and Coqui TTS technologies. Safety is prioritized through mechanisms requiring human confirmation for potentially risky actions.
Built upon LangGraph and Ollama, ZkzkAgent utilizes local large language models (LLMs) to maintain data privacy and employs a cyclic graph architecture for executing tasks with stateful processes. Users can initiate the tool on Linux systems like Ubuntu 20.04+, using Python 3.10 or higher and needing about 5GB of disk space. Installation involves setting up Ollama, cloning the repository, creating a virtual environment, and installing dependencies, while allowing customization through configuration files.
Operational modes include text input for commands and Whisper-based voice recognition. ZkzkAgent offers extensive usage examples across various domains such as file management, network operations, and web searches, supporting custom tool additions and advanced configurations for both Whisper models and TTS settings. The project is organized into directories for core components, AI models, auxiliary modules, and tools, with troubleshooting guides covering common issues like Ollama connection errors and permission denials.
Performance optimization can be achieved by using smaller models or disabling non-essential features like TTS, along with enabling GPU acceleration for faster processing when needed. Security measures ensure local-only data handling, no telemetry collection, mandatory confirmations for destructive actions, script inspections, and isolated execution of processes. The project encourages contributions with detailed guidelines and is distributed under the MIT License, recognizing key contributors such as LangChain, Ollama, Whisper, Coqui TTS, and NetworkManager. Support channels are available within the Linux community for addressing issues, questions, or feature requests.
Keywords: #phi4, AI assistant, LangGraph, Linux, NetworkManager, Ollama, Python, TTS, Whisper, ZkzkAgent, deployment scripts, file operations, local execution, natural language, network management, privacy-first, process management, security, self-hosted, system manager, voice interface
github.com 6 days ago
|
1152.
HN
A Customizable Coding Agent: custom tools, Python API, and any local/cloud LLM
PatchPal is an AI-powered coding agent designed to enhance both local and cloud-based Large Language Models (LLMs), offering advanced features such as autopilot mode and extensible tools. This tool provides interactive coding capabilities within programmable agent frameworks, enabling users to operate it directly from the terminal or embed it in Python scripts. Its standout feature is customizability, which includes support for creating custom tools and skills, a flexible Python API, and compatibility with various LLMs that facilitate tool calling.
Installation of PatchPal is streamlined through pip, allowing users to select different model providers such as Anthropic, OpenAI, vLLM, or Ollama by setting up the necessary environment variables for API keys. Users have the flexibility to choose from multiple supported models via command-line arguments or environment variables. Beyond coding assistance, PatchPal serves as a multifaceted assistant capable of conducting web searches, handling file operations, executing shell commands, analyzing data, and processing documents.
Comprehensive documentation and detailed setup instructions are available on its official site, ensuring users can effectively utilize all the features and capabilities offered by PatchPal.
Keywords: #phi4, AI coding agent, API interactions, Anthropic models, LiteLLM, Ollama, OpenAI models, PatchPal, Python API, automation, autopilot mode, cloud LLMs, custom tools, data analysis, environment variable, general problem-solving Keywords: PatchPal, human-in-the-loop, local LLMs, programmatic agents, research, software development, vLLM, web scraping
github.com 6 days ago
https://github.com/wiseprobe/patchpal 6 days ago
https://ai.wiseprobe.io/patchpal/ 6 days ago
|
1458.
HN
Show HN: PolyMCP – AI-Callable Python and TS Tools with Inspector and Apps
PolyMCP is an open-source framework centered around the Model Context Protocol (MCP), designed to transform existing Python functions into tools usable by AI agents without necessitating code rewrites. It has developed into a cohesive ecosystem featuring three primary components: the Core Framework, which simplifies converting any Python function into an MCP tool; the PolyMCP Inspector, providing a graphical interface for examining, testing, and debugging MCP servers with capabilities like schema inspection and support for multiple servers; and the PolyMCP SDK Apps, which help construct full-fledged MCP-powered applications by integrating tools with user interface resources. The framework offers several advantages, including the use of actual APIs, integration into business workflows without modifying legacy systems, and facilitating AI tool adoption through a unified standard interface instead of vendor-specific solutions. This makes it particularly advantageous for organizations aiming to repurpose existing code efficiently and implement agent-driven processes. Repositories for the PolyMCP core, Inspector UI, and SDK Apps are available on GitHub, encouraging feedback from developers engaged with MCP servers or internal AI tools.
Keywords: #phi4, AI agents, APIs, Anthropic, DevTools, GitHub repositories, HTTP server, Inspector UI, MCP tools, Model Context Protocol, Ollama, OpenAI, PolyMCP, Postman, Python, SDK Apps, code reuse, debugging, enterprise frontends, functions, open-source framework, orchestration, uvicorn, workflows
news.ycombinator.com 8 days ago
|
1468.
HN
Build a AI coding agent in less than 700 lines of Python code
The book provides a hands-on approach to constructing an AI coding agent, Nanocode, using under 700 lines of Python code focused on clarity and simplicity, avoiding complex frameworks. Targeted at developers who are cautious about conventional AI tools, Nanocode is designed as a production-grade utility that can perform tasks such as reading, writing, editing files, executing shell commands with self-correction capabilities, and searching through code using core Python features alone. It retains context across sessions via a persistent Markdown file and prioritizes safety by requesting permission before carrying out potentially risky operations. The architecture of the agent consists of four main components: a stateless API call known as the Brain, Python functions termed Tools, a self-modifying memory system, and an ongoing operational loop. Emphasizing transparency and ease of debugging in AI tools, this guide equips developers with practical skills to craft efficient coding agents devoid of "magic" solutions, thereby promoting clear understanding and control over their implementations.
Keywords: #phi4, AI coding agent, AI hype, API call, Claude, DeepSeek, Edit, Markdown file, Nanocode, Ollama, Python code, Read, Run, Search, Write, files, persistent scratchpad, production-grade, shell commands, software engineer, terminal-based, while loop
leanpub.com 8 days ago
|
1554.
HN
Show HN: Selling an AI interview assistant with ~2k users (no revenue)
Natively is an open-source, privacy-centric AI assistant designed to enhance professional interactions by providing real-time support during meetings and interviews. It operates as a desktop overlay that analyzes screen content and delivers context-aware suggestions instantly without requiring post-processing. With around 2,000 users acquired organically from platforms like GitHub, Natively remains unmonetized at present due to the creator's shift in focus towards another project.
The product is praised for its clean codebase and modern AI stack, which supports both local and cloud-based operations. Key features include real-time transcription reliant on Google Speech-to-Text, context awareness that evolves over time, screenshot analysis, and the ability to generate instant replies. It integrates various AI models such as Gemini, OpenAI's GPT, Anthropic Claude, and Ollama for offline functionality.
Privacy is a cornerstone of Natively's design, ensuring local data storage with no telemetry and offering users control over cloud interactions. To use Natively, installation requires Node.js, Rust, Git, and specific API keys, supported by a tech stack including React, Vite, TypeScript, TailwindCSS, Electron, and Rust.
The platform invites contributions for bug fixes, feature enhancements, or new AI integrations under the AGPL-3.0 license, with users being responsible for compliance with applicable laws and workplace policies. Natively is particularly suited to those who prioritize privacy and local processing in their productivity tools. The project welcomes sponsorships or partnerships, especially from companies within the AI and developer tool sectors.
Keywords: #phi4, AGPL-30, AI assistant, Claude, Electron, Gemini, Google Speech-to-Text, Groq, Ollama, OpenAI, Rust, cloud providers, desktop overlay, developer tools, interviews, local AI, meeting intelligence, multimodal, open-source, privacy-first, productivity, real-time transcription, rolling context, screenshot analysis
github.com 8 days ago
|
1620.
HN
Show HN: Axiom – Open-source AI research agent that runs locally (C#, Ollama)
Hex Dynamics has introduced Axiom, an open-source, locally-run AI-powered research agent developed in C# using .NET 8. Axiom utilizes Ollama to run large language models (LLMs) on local machines and employs the Brave Search API for autonomous web searches related to specific topics. It autonomously generates diverse search queries, retrieves and evaluates relevant sources, and compiles structured markdown reports with citations without relying on cloud-based AI services like OpenAI or Anthropic. Axiom's key features include multi-query web research, intelligent content extraction using SQLite for persistent memory, real-time progress updates via a Web API, and a command-line interface for quick searches. Designed to run entirely on local hardware, Axiom ensures user data privacy.
In addition to Axiom, Hex Dynamics provides the AgentKit starter kit through Gumroad, aimed at aiding the C# community in developing similar agents. They also offer the Command Center, a real-time dashboard created with Node.js and Express for team management and research monitoring purposes. Although efficient on mid-range CPUs without needing GPUs, Axiom's CPU inference can be slow, taking approximately 15 minutes per run. The project emphasizes a local-first approach to AI tools, allowing developers to maintain full control over their data and stack while it continues to evolve with ongoing public development.
Keywords: #phi4, AgentKit, Axiom, Brave Search API, C#, CLI, Cloudflare Tunnel, Command Center, LLMs, NET 8, Nodejs, Ollama, SQLite, WebSocket, auto-status detection, autonomous, deployment, local AI tools, markdown report, mobile responsive, multi-query web research, persistent memory, real-time SSE streaming, research agent, semantic memory, structured reports
github.com 8 days ago
|
1633.
HN
Show HN: Codedocent – Turn any codebase into visual blocks with plain English
Codedocent is an innovative tool designed to assist non-programmers in understanding complex codebases by transforming them into visual representations accompanied by plain English summaries. Developed by a designer/engineer who sought a means to comprehend code without directly engaging with the source text, Codedocent leverages local AI technology through Ollama to create interactive, color-coded block diagrams that depict the structure of code. Each block provides detailed explanations and pseudocode translations, along with indicators assessing quality. The installation process requires Python 3.10+ and involves using `pip install codedocent`. Users can choose from various modes including a setup wizard, an interactive mode for specific file paths, a comprehensive analysis option, or a graphical user interface launcher. Codedocent employs the tree-sitter library to parse code, assesses quality, and utilizes Ollama for generating summaries. It supports full abstract syntax tree (AST) parsing for languages like Python and JavaScript/TypeScript, alongside file-level detection capabilities for 23 additional languages such as C++, Rust, Java, and HTML. The project is distributed under the MIT license, making it freely available for use and modification.
Keywords: #phi4, AI-generated summaries, AST parsing, C++, CSS, Codedocent, Go, HTML, Java, JavaScript, Kotlin, Ollama, PHP, Python, Ruby, Rust, Scala, Swift, TypeScript, code visualization, interactive visualization, local AI, non-programmers, static analysis, tree-sitter
github.com 8 days ago
|
1702.
HN
What I learned from a desktop AI tool getting 400 stars in days
Natively is a sophisticated open-source desktop AI assistant crafted for enhancing live interactions such as meetings and presentations with a strong emphasis on privacy and real-time functionality. Unlike conventional tools that process data post-event, Natively operates continuously as an always-on-top overlay on the user's desktop, offering features like real-time transcription, rolling context memory across speakers, and instant suggestions. It leverages Google Speech-to-Text for speech recognition, provides screenshot and screen content analysis, and generates responses and follow-up questions instantly.
The tool is designed with privacy at its core, operating under an AGPL-3.0 license, ensuring that all data remains local without any telemetry or tracking. Users have full control over whether to use cloud AI services like Google Gemini or opt for offline processing via Ollama, emphasizing user autonomy in data management and processing.
For installation, Natively requires Node.js, Git, and Rust to facilitate native audio capture capabilities. Its development utilizes a combination of technologies including React, Vite, TypeScript, TailwindCSS, Electron, and Rust, encouraging community contributions through bug fixes, feature additions, documentation enhancements, and new integrations. As a free tool, Natively presents itself as a privacy-first alternative to commercial solutions, focusing on enhancing productivity and learning by seamlessly integrating into both professional and academic settings.
Keywords: #phi4, AGPL-30, AI, Electron, Gemini 30, Groq, Linux support, Ollama, React, Rust, SQLite, TailwindCSS, TypeScript, Vite, always-on-top UI, cloud AI, context-aware, desktop overlay, local AI, meeting intelligence, open-source, privacy-first, real-time, screenshot analysis, transcription
github.com 9 days ago
|
1738.
HN
Promptfoo: Local LLM evals and red teaming
Promptfoo is an advanced tool crafted for developers working with Large Language Model (LLM) applications, facilitating testing in a local environment to enhance efficiency and security. It replaces traditional trial-and-error approaches by incorporating automated evaluations, red teaming, vulnerability scanning, and comparisons among various LLM providers like OpenAI, Anthropic, Azure, Bedrock, and Ollama. Promptfoo integrates seamlessly with CI/CD pipelines for automated checks and pull request reviews to identify potential security issues, making the development process more robust. The tool emphasizes speed and privacy by conducting evaluations locally without external prompt sharing. It offers flexibility across different LLM APIs and programming languages and has proven reliability from its deployment in applications serving millions of users worldwide.
Developers can rely on data-driven metrics for decision-making rather than intuition, thanks to Promptfoo's comprehensive evaluation features. Being open-source under the MIT license, it fosters an active community that provides support through platforms such as Discord and GitHub. To utilize Promptfoo, developers start with command-line instructions like `npx promptfoo@latest init` for project initialization and `npx promptfoo eval` for evaluations. Detailed documentation, guides on contributing, and additional information are accessible via their official website and GitHub repository, supporting a collaborative and informed development experience.
Keywords: #phi4, AI apps, Anthropic, Azure, Bedrock, CI/CD, Discord, LLM evals, Ollama, OpenAI, Promptfoo, code scanning, community, community Extracted Keywords: Promptfoo, community Final List: Promptfoo, community Keywords: Promptfoo, contributing, documentation, local tool, metrics, open source, red teaming, reliability, security, testing, vulnerability scanning
github.com 9 days ago
|
1743.
HN
Open source real-time screen analysis tool powered by Screenpipe and local LLM
LivePipe is an open-source, real-time screen analysis tool designed to function on macOS using Screenpipe and a local Large Language Model (LLM) called Ollama. Currently in a research and testing phase, LivePipe tracks the user's screen activities to identify actionable items such as tasks, reminders, meetings, and deadlines, subsequently issuing desktop notifications for these detected actions.
To set up LivePipe on macOS, users must first ensure they have Screenpipe CLI, PM2, and the LLM model qwen3:1.7b installed. The setup involves cloning a repository and installing necessary dependencies using Bun. Configuration is completed through a template file, after which users can initiate the tool in development mode by executing a dev script. This script manages processes via PM2 and starts a Next.js server responsible for content polling and notification dispatch.
Permission to use LivePipe includes access to screen recording and notifications on macOS. For delivering notifications, it primarily uses AppleScript for default system notifications, but also supports optional integration with external services such as Feishu, Telegram, or any generic webhook, allowing for customizable JSON payloads. The project is distributed under the MIT license.
Keywords: #phi4, Bun, CLI, Feishu, JSON payload, LivePipe, Low-Key Preview, MIT License, Ollama, Open source, PM2, Screenpipe, Telegram, actionable items, config template, git clone, local LLM, macOS, notifications, qwen3:17b, real-time, research, screen analysis, testing, unstable, webhook push
github.com 9 days ago
|
1791.
HN
LocalLLMJournal – An offline, privacy-first AI journal running locally on macOS
LocalLLMJournal is a locally hosted AI-powered journaling application designed for macOS users who prioritize privacy and offline functionality. The application facilitates the transformation of raw thoughts into refined journal entries through AI-guided conversations, without necessitating cloud storage or external API keys. Key functionalities include converting brain dumps into polished journal entries, performing semantic searches on past entries via natural language queries, and organizing these entries with mood tags and dates for straightforward browsing.
Developed using Python and FastAPI for the backend, LocalLLMJournal’s frontend is crafted from Vanilla HTML/CSS/JS. It leverages Ollama models—"llama3.2:3b" for chat interactions and "nomic-embed-text" for embedding generation that supports semantic search capabilities. For data storage, the app uses SQLite and ChromaDB locally to maintain user privacy.
Setting up LocalLLMJournal involves cloning its repository, setting up a Conda environment, pulling necessary Ollama models, and running it via Python. The project's structure is organized into directories for configuration, database operations, LLM integration, semantic search, and frontend files. Despite being lightweight enough to run on hardware like the M1 MacBook Air with 8GB RAM, it provides robust functionality. The application is released under an MIT license, encouraging open usage and modification.
Keywords: #phi4, AI journal, ChromaDB, Conda, FastAPI, Local LLM, Ollama, Python, SQLite, backend, brain dump, chat model, dialogue, embeddings, frontend, macOS, models, offline, privacy-first, semantic search, storage, system prompts, vector store
github.com 9 days ago
|
1805.
HN
Recoll Semantic Searches
Recoll has introduced "Recoll Semantic Searches" through its Python API, significantly enhancing document search by incorporating language models. This new feature offers two primary functionalities: Retrieval Augmented Generation (RAG) and semantic queries. RAG uses generative language model outputs informed by keyword-based searches, while semantic queries aim to identify documents aligned with the concepts in a query rather than precise terms.
To enable these advanced search capabilities, minimal modifications were made to Recoll's main codebase, primarily affecting GUI elements for managing new search types. The setup involves creating or updating an index using recollindex and generating document embeddings via a script called `rclsem_embed.py`, which employs the default language model nomic-embed-text. These embeddings are stored in chromadb, facilitating semantic searches by matching user-generated query embeddings with relevant documents.
For users to implement this feature, they must clone Recoll's source code, activate semantic search options during installation, and establish a Python virtual environment. Users can adjust variables like document selection for processing (`sem_rclquery`), the choice of embedding model, and storage paths as needed. Although current implementations are constrained by CPU limitations without GPU support, this feature sets the stage for future exploration into reranking with language models.
Overall, Recoll's Semantic Searches enable users to conduct more sophisticated searches based on conceptual understanding rather than exact keywords. This innovation provides a robust foundation for further advancements in document retrieval systems.
Keywords: #phi4, Embeddings, GPU, GUI, Indexing, Language Model, Meson, Python API, RAG, Recoll, Reranking, Semantic Queries, Semantic Searches, Virtual Environment, chromadb, ollama
www.recoll.org 9 days ago
|
1919.
HN
Aisbf – an intelligent routing proxy for OpenAI compatible clients
AISBF (AI Service Broker Framework) serves as an intelligent routing proxy that facilitates seamless integration with multiple AI providers via a unified API interface. It supports OpenAI-compatible clients and offers extensive features such as multi-provider support for Google, OpenAI, Anthropic, and Ollama. The framework employs weighted load balancing with automatic failover to ensure reliable request distribution across different providers. Additionally, it utilizes AI-assisted model selection based on content analysis to optimize performance. AISBF supports streaming responses and incorporates robust error tracking mechanisms that disable providers after repeated failures. It also includes rate limiting and token usage tracking to manage resource consumption effectively.
Key functionalities of AISBF encompass multi-provider support through a unified interface, load balancing with automatic failover, AI-powered model selection tailored to request content, and comprehensive streaming and error handling capabilities. The framework implements built-in rate limiting and manages token usage by disabling providers when limits are exceeded. Context management is enhanced using various condensation methods such as hierarchical, conversational, semantic, and algorithmic approaches.
Developed by Stefy Lanza, AISBF can be installed via PyPI or from source code. It provides a range of API endpoints for server status checks, chat completions, model listings, rotations, autoselect configurations, and error handling. The project encourages donations through Web3/MetaMask, PayPal, and Bitcoin, and is distributed under the GNU General Public License v3.0.
Keywords: #phi4, AI Service Broker Framework, AISBF, API interface, Anthropic, Bitcoin Comma-Separated Keywords: AISBF, Bitcoin Extracted Keywords: AISBF, Bitcoin Final Answer: AISBF, Bitcoin Final Keywords: AISBF, Bitcoin Final List: AISBF, Bitcoin Keywords: AISBF, Bitcoin Selected Keywords: AISBF, Bitcoin Simplified Keywords: AISBF, Google, Ollama, OpenAI, PayPal, PyPI installation, Web3/MetaMask, autoselect endpoints, configuration, context management, donations, error handling, error tracking, load balancing, model selection, multi-provider support, rate limiting, request splitting, rotation models, routing proxy, streaming support, token rate limiting
pypi.org 10 days ago
https://pypi.org/project/aisbf/ 10 days ago
|
1931.
HN
Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory
LocalGPT is an innovative AI assistant developed in Rust, designed to function as a local-first tool with persistent memory capabilities, reimagining the OpenClaw assistant pattern. It compiles into a compact ~27MB binary without dependencies like Node.js, Docker, or Python. Key features include markdown-based persistent memory compatible with OpenClaw's format, full-text and semantic search using SQLite FTS5 and local embeddings, an autonomous heartbeat task runner, and support for multiple language model providers such as OpenAI, Anthropic, and Ollama.
The tool offers various interfaces including a CLI, web interface, and desktop GUI, along with programmatic access via REST endpoints. Licensed under Apache 2.0, LocalGPT can be installed using `cargo install localgpt`. It functions as a knowledge accumulator, research assistant, and task runner, with its memory improving over time.
Configuration is managed through a TOML file, while markdown files store knowledge and tasks, indexed by SQLite FTS5 for efficient search. Users can interact via CLI commands or an HTTP API when running in daemon mode. The project is hosted on GitHub at [localgpt-app/localgpt](https://github.com/localgpt-app/localgpt) with a dedicated website at [localgpt.app](https://localgpt.app), and feedback on architecture and feature ideas is encouraged.
Keywords: #phi4, AI assistant, Anthropic, Apache 20, CLI, HTTP API, LocalGPT, Ollama, OpenAI, REST endpoints, Rust, SQLite FTS5, autonomous task runner, cargo install, chat endpoint, configuration, daemon, desktop GUI, health check, heartbeat tasks, knowledge store, lightweight binary, local embeddings, markdown files, memory statistics, multi-provider, persistent memory, search memory, semantic search, server status, web interface, workspace
github.com 10 days ago
https://www.youtube.com/watch?v=tRrKQl0kzvQ 10 days ago
https://github.com/localgpt-app/localgpt/blob/ 10 days ago
https://github.com/localgpt-app/localgpt.git 10 days ago
https://newsletter.pragmaticengineer.com/p/how-claude-c 10 days ago
https://www.pangram.com/history/dd0def3c-bcf9-4836-bfde 10 days ago
https://www.wsj.com/tech/ai/ai-spending-tech-compa 10 days ago
https://www.reuters.com/graphics/USA-ECONOMY/AI-IN 10 days ago
https://github.com/wardgate/wardgate 10 days ago
https://github.com/z80dev/lemon 10 days ago
https://star-history.com/#localgpt-app/localgpt&Dat 10 days ago
|
1990.
HN
Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU
The "Local Agent Bench" study assesses 11 small language models (LLMs) on their ability to make tool-calling decisions using only CPU resources, without relying on GPUs or cloud APIs. The focus is on the models' judgment in deciding when and which tools to call rather than merely executing commands correctly. Key findings reveal that smaller models like qwen2.5:1.5b performed better under a safety-weighted scoring system by declining uncertain actions, whereas larger models were more aggressive but prone to errors. Models struggled with prompts requiring judgment, such as resisting keyword triggers or recognizing redundant information, and no sub-4B model consistently handled all tested judgment dimensions.
The study highlights that many models incorrectly called tools based on keywords alone, ignoring context or explicit instructions against doing so. Conservative models that avoided uncertain actions scored higher in scenarios where wrong decisions had significant consequences. While local models can effectively handle straightforward tasks, they require additional safety layers for ambiguous prompts to prevent incorrect tool calls. The study concludes that full autonomy is premature with sub-4B models due to their tendency to confidently make wrong decisions based on keyword cues.
The findings suggest using local models as fast routers for clear requests but recommend caution and human oversight for more complex decision-making tasks. The results emphasize the importance of testing specific prompts and considering deployment contexts when evaluating model performance, underscoring the need for careful integration of these models into practical applications.
Keywords: #phi4, AI Agents, Action Score, Arch Linux, CPU, Function-calling, GPU, Instruction-following, Judgment Dimensions, Keyword Triggers, Latency, Local Agent, Multi-tool Requests, Ollama, Open-weight Models, Quantised Models, Reliability, Restraint Score, Safety-Weighted Scoring, Small LLMs, Tool-calling
github.com 11 days ago
|
2009.
HN
Show HN: Open-source AI assistant for interview reasoning
"Natively" is an open-source desktop AI assistant designed to facilitate complex interview-style interactions, including system design discussions and multi-step coding problems. It supports both cloud-based and local large language models (LLMs), allowing users the flexibility to use their own API keys for enhanced control over billing and data privacy. The project prioritizes managing context, follow-ups, and failure cases rather than focusing solely on quick single-shot answers. Developed with Antigravity for rapid iteration, it ensures predictable behavior under pressure due to its opinionated design.
Key features of "Natively" include an invisible AI assistant that integrates seamlessly across applications through a translucent window, smart screenshot analysis for instant insights, and audio intelligence using a native Rust module for real-time transcription and analysis. It also offers contextual chat capabilities with follow-up support. Users can choose between local processing via Ollama for privacy or cloud-based Google Gemini for performance.
The assistant is built using technologies such as React, Vite, TypeScript, TailwindCSS, Electron, and Rust, storing data locally in SQLite to maintain user control over information. It supports various AI models like Google Gemini and Ollama's Llama 3.2, offering both free and premium features. Development requires Node.js, Git, and Rust, with a focus on privacy-first design and offline capabilities when using local AI.
Contributions are encouraged in areas such as bug fixes, new features, documentation, and UI enhancements. The project is licensed under AGPL-3.0, necessitating source code availability if used over a network.
Keywords: #phi4, AGPL-30, AI, API key, Electron app, Gemini, Google Cloud, Groq, Natively, Ollama, Open-source, React, Rust module, SQLite, TailwindCSS, TypeScript, cloud LLMs, coding problems, context management, desktop assistant, interview, local LLMs, offline mode, privacy-first, reasoning, speech-to-text, system design
github.com 11 days ago
|
2039.
HN
Installing Ollama and Gemma 3B on Linux
To install Ollama, a personal AI assistant on Linux, execute the command `curl -fsSL https://ollama.com/install.sh | sh` in the terminal. For additional installation instructions or guidance for other operating systems, users should refer to Ollama's official website at [Ollama's download page](https://ollama.com/download). Once installed, users can access and download various AI models from Ollama’s library via [ollama.com/search](https://ollama.com/search), including the 1B version of Gemma 3. This particular model is noted for its efficiency, requiring only 1.5-2 GB of RAM to deliver fast responses. To run this model, use the command `ollama run gemma3:1b`. Users can then input their prompts directly into the Ollama terminal to receive generated text outputs.
Keywords: #phi4, AI, CPU, Gemma 3B, Linux, Ollama, RAM, command, context window, development environment, development environmentComma-separated list: Ollama, development environmentExtracted Keywords: Ollama, development environmentFinal Keywords: Ollama, development environmentKeywords: Ollama, development environmentOllama, download, generate, inputs, installation, library, models, personal assistant, prompt, response, run, search, speed, terminal, testing, text, variation, version, website
byandrev.dev 11 days ago
|
2041.
HN
Show HN: LLM-use – Open-source tool to route and orchestrate multi-LLM tasks
LLM-use is an open-source Python framework designed to streamline the management of large language model (LLM) workflows across both local and cloud environments. It offers a comprehensive suite of features including smart routing, cost tracking, session logging, optional web scraping, and integration with MCP servers, enabling seamless agent workflows that involve planners, workers, and synthesis without necessitating manual intervention or custom coding. The framework supports multi-model orchestration by integrating various LLMs such as OpenAI, Anthropic, and Ollama/llama.cpp, allowing for smart routing and fallback mechanisms to select the most suitable models based on heuristics or learned preferences.
Key features of LLM-use include detailed cost tracking per run, local session logging, and optional enhancements like web scraping and caching to provide real-time data enrichment. Additionally, it supports integration with MCP servers through PolyMCP. Usage examples demonstrate its versatility: executing planner-worker flows locally, routing tasks between cloud-based orchestrators and local workers in a hybrid setup, and offering an interactive command-line interface (CLI) chat mode that provides live logs and cost breakdowns.
Overall, LLM-use simplifies the creation of robust multi-model LLM systems by eliminating dependencies on single APIs or manual orchestration processes. The framework is accessible via its GitHub repository, making it a valuable tool for developers looking to efficiently manage complex LLM workflows.
Keywords: #phi4, Anthropic API, LLM-use, MCP integration, Ollama, PolyMCP, Python, TUI chat mode, agent workflows, cloud models, cost tracking, hybrid usage, large language models (LLMs), local models, open-source, orchestrate, planner, session logs, smart routing, synthesis, web scraping, workers, workflows
news.ycombinator.com 11 days ago
|
2092.
HN
Show HN: Claude Code agent teams with real time shared local memory
Claude Code’s Nemp Memory plugin replaces cloud‑based context management by keeping all project and global memories in plain JSON files on the local machine, synchronizing them with a CLAUDE.md file; it is installed with a single marketplace add command followed by `/plugin install nemp`, and after installation `/nemp:init` automatically fingerprints the entire tech stack (framework, language, database, authentication, styling, package manager) and stores this as a permanent “memory” so new developers need no manual documentation. The plugin provides instant semantic context through `/nemp:context <term>`, expands queries (e.g., “auth” to authentication, JWT, NextAuth, Clerk), and lists matching memories with quick actions; proactive suggestions are generated with `/nemp:suggest` by analyzing recent edits, new packages, directory patterns, and command usage to draft high‑priority memories. A unique auto‑sync feature (`/nemp:auto-sync on`) updates the project‑context section of CLAUDE.md whenever `/nemp:save`, `/nemp:init`, or `/nemp:forget` run, while two‑way sync (`/nemp:sync`) imports notes from CLAUDE.md, validates against actual project files (flagging mismatches), and ensures Claude never operates on stale information. Core commands include `/nemp:sync` for import/validation, `/nemp:export` to generate a tidy “Project Context” table in CLAUDE.md, `/nemp:list` to confirm installation, and various key/value memory commands (`/nemp:save`, `/nemp:recall`, `/nemp:forget`). Troubleshooting steps involve verifying Git connectivity, configuring proxies, handling Windows permission errors (EPERM) by running as Administrator and adding the project folder to Defender exclusions, clearing caches, and reinstalling the plugin. Uninstalling requires `/plugin uninstall nemp` and `/plugin marketplace remove nemp-memory`, deleting project or global memory directories, and clearing cache. Practical use cases show how `/nemp:init` supplies stack details during onboarding, `/nemp:recall stack` restores project context when switching projects, and storing decisions like `api-design` can be retrieved via `/nemp:context api`. All data remains local in human‑readable JSON files (`.nemp/memories.json` for projects and `~/.nemp/memories.json` for global data) with no cloud integration, and authentication is handled locally by NextAuth.js using JWT. Nemp is open‑source, MIT‑licensed, privacy‑first, and invites contributions to improve framework detection, suggestions, and import/export features.
Keywords: #gpt-oss:20b, API, Auto-detect, Claude, JSON, Memory, MongoDB, Nemp, Nextjs, Ollama, Prisma, Privacy, SQLite, TypeScript
github.com 12 days ago
https://vimeo.com/1162546825?share=copy&fl=sv&fe=ci 12 days ago
https://github.com/SukinShetty/Nemp-memory 12 days ago
https://crabernews.com/posts/51157 12 days ago
|
2123.
HN
GeoGPT – Chat-controlled GIS app built from a Jupyter Notebook
GeoGPT is a Jupyter‑Notebook–based GIS assistant that lets users query spatial data with natural language, using a locally hosted GPT‑OSS 20 B model accessed via Ollama; the assistant interprets user commands, calls a predefined set of Python “tools” that manipulate a geemap map rendered with OpenStreetMap tiles, and displays results interactively, all within a Mercury‑powered web interface that arranges the chat widget and map in a two‑column layout; setting up the system requires installing Ollama, launching the model with `ollama run gpt-oss:20b`, installing the `ollama`, `geemap`, and `mercury` packages, and initializing a geemap map without Google Earth Engine; the toolset includes functions such as `set_view`, `set_basemap`, `clear_layers`, `add_marker`, `set_aoi`, `osm_search` (querying Overpass for OSM tags within the current AOI), and `geocode_city` (using Nominatim), which are bundled into a `TOOLS` list supplied to the LLM; the chat loop handled by Mercury streams model output token‑by‑token, distinguishes “thinking” notes, content, and tool calls, executes any requested tool, and appends tool outputs back into the conversation for the model’s next turn, thus keeping interactions safe, predictable, and easy to debug; finally, the notebook can be launched as a standalone web app via `mercury serve`, enabling users to interact with the GIS workflow without writing frontend code.
Keywords: #gpt-oss:20b, GIS, GeoGPT, Jupyter Notebook, LLM, Mercury, Ollama, OpenStreetMap, Overpass API, Python, chat interface, geemap, natural language
mljar.com 12 days ago
|
2142.
HN
Gokin: Go-Native CLI for AI-Assisted Coding with Gemini, DeepSeek, GLM, Ollama
Gokin is a Go‑based command‑line assistant that streamlines AI‑driven software development by delegating code generation to inexpensive or free models such as GLM‑4, DeepSeek, Gemini Flash 3, or local Ollama and then polishing with the higher‑cost Claude Code, with costs ranging from free local use to roughly $100 / month. It offers extensive file manipulation, sandboxed shell execution, and versatile search (glob, regex, semantic embeddings), all configurable via an environment‑driven backend selection (`GOKIN_BACKEND` or `config.yaml`) and a local Ollama setup. Its intelligence is built on a multi‑agent architecture—Explore, Bash, Plan, General—backed by a Tree Planner that can use Beam Search, MCTS, or A*, a Context Predictor for anticipating file access, and a semantic search engine for meaning‑based code retrieval. Productivity is enhanced with Git integration, task and todo handling, cross‑session memory, session persistence, undo/redo, and a unified `/` command interface for session control, cost reporting, configuration, and authentication (`/oauth‑login`, `/login`, `/logout`). Installation requires Go 1.23+, repository cloning, binary build or installation, and PATH configuration, with authentication supplied via OAuth (Gemini), API keys (DeepSeek, GLM‑4), or a running local Ollama instance. The tool exposes over fifty AI‑powered operations across file management, search, shell, Git, web fetching, planning, task, and memory management, all orchestrated under `~/.config/gokin/config.yaml`. Gokin stores credentials in `GEMINI_API_KEY`, `DEEPSEEK_API_KEY`, `GLM_API_KEY`, `OLLAMA_API_KEY` (or `GOKIN_*` aliases) and allows model overrides with `GOKIN_MODEL`. It enforces a 2‑minute request timeout, a sandboxed bash environment that blocks destructive commands, streams Markdown‑rendered output, and automatically summarizes inputs exceeding 50 % of the context limit, while warning at 80 %. Permission defaults to “ask” for writes and bash, with hooks disabled but memory enabled for up to 1,000 auto‑injected entries. Semantic indexing occurs at startup using 500‑char chunks with 50‑char overlap, a 1 MB file cap, and caches in `~/.config/gokin/semantic_cache` with a 7‑day TTL; indexable file types include code and documentation, excluding vendor, node_modules, git, and minified assets. The application, launched via `cmd/gokin/`, is modularized under `internal/` with components for orchestration, multi‑agent coordination, AI provider adapters, Model Context Protocol integration, and a rich set of tools, while auxiliary directories manage commands, context, security, permission, hooks, memory, semantic, UI, and configuration. Users can inspect startup logs for debugging, run `/doctor` to check the environment, `/auth-status` for authentication, `/login` with OAuth, `/compact` or `/clear` to manage context, and review `~/.config/gokin/config.yaml` for permission policies; the project is released under the MIT License.
Keywords: #gpt-oss:20b, AI, CLI, DeepSeek, GLM-4, Gemini, Git Integration, Gokin, LLM, MCP, Memory System, Multi-Agent, Ollama, Semantic Search, Task Management, Tree Planner
github.com 12 days ago
|
2172.
HN
https://news.ycombinator.com/item?id=46908762
zyron‑assistant is a Windows‑first, local‑first personal assistant that monitors files, passwords, and other personal data on the user’s own laptop while remaining OS‑agnostic so macOS and Linux support is anticipated. It employs the open‑source Ollama language model entirely locally—eschewing cloud inference, API calls, and any outbound data transmission—solely parsing user intent and time expressions without executing actions or accessing credentials. Remote interaction is limited to Telegram when an internet connection is available; otherwise, the assistant operates entirely locally. The project, along with its repository and documentation, is hosted on GitHub.
Keywords: #gpt-oss:20b, Hacker News, Linux, OS-agnostic, Ollama, Windows-only, architecture, cloud, local, macOS, passwords, personal information, portability, security, tracking bot
news.ycombinator.com 12 days ago
https://github.com/Surajkumar5050/zyron-assistant 12 days ago
|
2173.
HN
Built a desktop assistant [fully local] for myself without any privacy issue
The ZYRON Desktop Assistant is a free, fully‑local AI tool for Windows that lets users control their PC via voice (“Hey Pikachu”) or Telegram, powered by the Qwen 2.5 Coder model running entirely on the machine with no cloud uploads or subscriptions; it offers app launch, window management, power‑state commands, natural file browsing, real‑time system monitoring (CPU, RAM, disk, battery, active apps, browser tabs), stealth auto‑start, and enterprise‑grade privacy, making it a lightweight, private, highly automated desktop companion. It runs in either visible or stealth background mode, can be controlled through Telegram bot commands or voice commands, and includes features such as clipboard history (last 100 items), on‑demand screenshots, webcam access, 10‑second audio clips, smart search for files and recent documents, a 30‑day activity log of file accesses across 40+ types, adaptive preference learning, IP geolocation, network status, and lost‑device tracking; installation requires Python 3.10+, Ollama, a Telegram bot token, optional .env config, and an automated `setup.bat` that handles environment setup, dependency installation, AI model download (qwen2.5‑coder:7b), startup integration, and stealth mode configuration, with launch via `python main.py` for visible mode or `run_silent.vbs` for background operation. The architecture is modular with a Python backend (`main.py`, `brain.py`, `listener.py`, `wake_word.py`, `tele_agent.py`, `muscles.py`, `memory.py`, `activity_monitor.py`, `file_finder.py`, `file_tracker.py`, `clipboard_monitor.py`), Chrome/Firefox browser extensions for tab monitoring, documentation in `docs/`, and deployment scripts (`setup.bat`, `run_silent.vbs`, `start_pikachu.bat`), and is fully open‑source under an MIT license, with the project emphasizing zero cloud dependency, full local AI inference via Ollama, and privacy‑first data handling.
Keywords: #gpt-oss:20b, AI, Chrome, Firefox, Ollama, Qwen 25, Telegram, Vosk, app monitoring, automation, battery monitoring, desktop assistant, file search, privacy, storage analysis, voice commands
github.com 12 days ago
|
2216.
HN
Ask HN: When should you stop building an open-source AI agent framework?
A developer is reflecting on their experience in creating an open-source AI agent framework and is seeking guidance on the future direction of the project, considering the lack of initial interest it has received. They are looking for insights on how to achieve early traction with AI tools, the specific challenges that arise when developing AI agents for production environments, and honest feedback regarding the potential and viability of their project. The developer's inquiry highlights concerns about the relevance and appeal of their framework in the current AI landscape, as well as the practical difficulties involved in bringing such a project to fruition. They are seeking both encouragement and constructive criticism to help determine whether to continue refining the project, pivot its focus, or consider discontinuing it altogether.
Keywords: #qwen3:14b, AI agent, Ollama, PyPI, Python, ReAct, ReWOO, ToT, circuit breakers, cost control, demotivated, framework, guardrails, idempotency, local runs, multi-LLM, open-source, pivot, production-ready, reliability, traction
news.ycombinator.com 13 days ago
|
2353.
HN
Show HN: ChatVault – Search your Claude conversations locally with RAG
ChatVault is an MIT‑licensed, open‑source local‑first assistant that imports exported chat logs (currently Claude, with ChatGPT and Gemini on the roadmap) into a SQLite/ChromaDB database, enabling hybrid keyword‑and‑semantic search and RAG‑powered Q&A powered by a local Llama 3 model via Ollama or the remote Claude API, all running on the user’s machine to preserve privacy. Its Python‑FastAPI backend exposes a REST API for managing conversations, messaging, tagging, statistics, and export utilities, while its React/Vite single‑page front‑end communicates through a dev proxy for seamless interaction; the system uses the `all‑MiniLM‑L6‑v2` transformer for embeddings stored in ChromaDB, and implements a hybrid search engine that merges semantic similarity with SQLite FTS5 keyword support. Installation is streamlined by a `run.sh` script that sets up a virtual environment, installs dependencies, builds the front‑end, downloads the embedding model, and starts the server; a setup wizard then imports unpacked JSON export files placed in a `data/` folder and configures the LLM backend via environment variables (`OLLAMA_HOST`, `OLLAMA_MODEL`, `ANTHROPIC_API_KEY`), creating a `~/.chatvault/config.yaml` configuration file. The architecture supports extensibility through plug‑in connectors found in `chatvault/connectors/`, encourages contributions for additional AI platforms, offers usage statistics, and is designed for ease of deployment, robust privacy, and zero data leakage.
Keywords: #gpt-oss:20b-cloud, API, Anthropic, ChatVault, ChromaDB, Claude, Embeddings, FastAPI, LLM, Nodejs, Ollama, RAG, React, SQLite, Vite
github.com 13 days ago
|
2368.
HN
Yet another reminder why you should not use Ollama
System-generated notes for a pull‑request interface include details on loading errors, current merge status, and approval flags, along with an extensive set of restrictions that prevent any suggestion from being accepted—such restrictions entail that no changes were made, the pull‑request was closed, certain lines were removed, and it is queued for merge.
Keywords: #gpt-oss:20b-cloud, CISC, Ollama, assigned, assignees, batch, commit, error, issues, loading, merge, page, pull request, reload, suggestion
github.com 13 days ago
https://github.com/ggml-org/llama.cpp/pull/19 13 days ago
|
2443.
HN
Someone made an live version of BMO from Adventure time (Local LLM) [video]
A YouTube video titled “Someone made a live version of BMO from Adventure Time (Local LLM)” documents a creator who developed a real‑time AI rendition of the cartoon character BMO, deploying it on a Raspberry Pi that runs locally via the LLM engine Ollama, thereby transforming the beloved character into an interactive chatbot.
Keywords: #gpt-oss:20b-cloud, 2026, Adventure time, BMO, Google, Local LLM, Ollama, Raspberry Pi, YouTube, live version, local AI agent, real BMO, video
www.youtube.com 14 days ago
|
2478.
HN
InsAIts: Monitoring for AI-AI comms. Detect hallucinations before propagation
InsAIts is a lightweight Python SDK engineered for real‑time monitoring of inter‑AI communication to uphold trustworthiness, combining an open‑source Apache 2.0 core with proprietary premium features distributed through `pip install insa-its`; it detects shorthand, context loss, jargon, hallucination chains, anchor drift, and other anomalies via a multi‑phase system where Phase 1 anchors the user query to suppress false positives, Phase 2 offers forensic chain tracing that maps an anomaly back to its originating message, and Phase 4 loads domain‑specific dictionaries to avoid false alerts, while the newly introduced Phase 3 delivers comprehensive hallucination detection across five subsystems—Fact Tracking, Phantom Citation Detection, Source Grounding, Confidence Decay, and Self‑Consistency—to flag contradictions, fabricated citations, grounding failures, certainty shifts, and internal conflicts; users can initialize a monitor, set an anchor, send messages with metadata, retrieve anomalies and severity levels, trace root causes, and gather statistics, all within a concise API (`insAItsMonitor`, `send_message`, `trace_root`, `get_stats`); the SDK provides a live terminal dashboard (`LiveDashboard`), seamless integrations with LangChain and CrewAI, optional decipher mode (cloud or local via Ollama) to translate verbose AI‑to‑AI expressions, and advanced features such as ASCII chain visualizations, Slack alerts, and export to Notion/Airtable—features split between free and paid tiers; installation is straightforward (`pip install insa-its[full]`) with demos and a privacy‑first design that keeps all processing local, hashes API keys, and adheres to GDPR, making it suitable for e‑commerce, customer support, finance, healthcare, and research contexts, and offered on a tiered pricing model ranging from a free 100‑message/day limit to lifetime and monthly subscriptions.
Keywords: #gpt-oss:20b-cloud, Anchor-Aware, Anomaly detection, Apache 20, Forensic tracing, Hallucination detection, InsAIts, Integrations, Local embeddings, Monitoring, Multi-Agent, Ollama, Open-Core, Pip install, Terminal dashboard
github.com 14 days ago
|
2655.
HN
Ask HN: How to share local models between tools?
The Ask HN post inquires how to configure locally downloaded large‑language‑model files—specifically those used with llama.cpp, Ollama, and ComfyUI—so that all three tools can access them concurrently, and whether there is a unified filesystem path or a standard convention for storing such models.
Keywords: #gpt-oss:20b-cloud, Ask HN, ComfyUI, downloading, files, llamacpp, local LLM, local models, ollama, share, standard, store, tools
news.ycombinator.com 14 days ago
|