Scraper
Spider

A robotic spider About
Blog
@dbaman@fosstodon.org
Click ▶ to show/hide AI summary and keywords
Click The google logo for Google search on keywords

2026-03-11 15:29
ollama
ollama stories from the last 14 days  | Back to all stories
42.  HN I built an AI agent in Zig that runs on Windows XP with 64 MB RAM
The "retro-agent" is a lightweight AI agent developed by the user in Zig 0.15 to function efficiently on legacy Windows XP systems, even with as little as 64 MB of RAM and Pentium III hardware. It operates as a thin client, relying on HTTP communication with an external Large Language Model (LLM) for executing system diagnostics such as process management and network tools, along with command execution through a terminal-based interface. The project tackles key technical challenges including managing the Win32 Console API for text output, handling character encoding conversions, adjusting time precision, optimizing limited memory usage, and enhancing security through command whitelisting. Additionally, it supports cross-compilation to run on various Linux architectures as well as Windows XP. Licensed under MIT, "retro-agent" is a collaborative project inviting feedback from those dealing with legacy systems or interested in Zig's cross-compilation features, with more information available on GitHub. Keywords: #phi4, AI agent, Hacker News, LLM, MIT licensed, Ollama, OpenAI-compatible API, Pentium III, RAM, RtlGetSystemTimePrecise, UTF-8, Win32 Console API, Windows XP, Zig, command whitelist, conversation history, cross-compilation, legacy systems, retro-agent, security, single-threaded, terminal-based
    The google logo   news.ycombinator.com 4 hours ago
   https://github.com/benmaster82/retro-agent   3 hours ago
194.  HN I Built an AI Agent That Writes Its Own Rules from Its Mistakes
The Persistent Agent Framework developed by the author introduces an AI agent designed to operate autonomously with persistent capabilities, addressing limitations found in stateless systems such as Claude Code. Key components of this framework include a consistent **Persistent Identity**, ensuring the agent maintains its unique attributes across sessions via specific files loaded at startup. The agent employs a **Session Memory** system utilizing a Supabase database for semantic search functionalities, allowing it to retain crucial decisions and knowledge from past interactions. To enhance decision-making, **Error Tracking and Correction** mechanisms are implemented; mistakes are logged with detailed signal tracing, enabling the automatic generation of behavioral directives when repeated errors occur. Furthermore, the framework supports **Multi-Terminal Coordination**, ensuring seamless continuity across multiple sessions through a shared backend system, which facilitates coherent parallel operations. The architecture is cost-effective, relying on tools like Claude Code, Supabase, and Ollama for minimal infrastructure needs. As an open-source resource, it serves as an architectural reference rather than providing complete code for specific integrations such as messaging platforms or daemons. It highlights patterns including signal tracing, hybrid memory loading, and atomic task claiming, which are valuable independently. By sharing this framework, the author encourages further development and practical application of these concepts, inviting others to experiment with and refine these mechanisms. The accompanying GitHub repository provides guidance on setting up and customizing aspects like the agent's identity and persistence strategies, fostering collaborative advancement in autonomous AI operations. Keywords: #phi4, AI, Architecture, Autonomous Jobs, Behavioral Rules, Circuit Breakers, Claude Code, Hybrid Memory Loading, Identity, Learning Enforcement HooksKeywords: Persistent Agent, Ledger, Memory, Mistakes, Multi-terminal Continuity, Ollama, Open Source, Operational Manager, Pattern Recognition, Persistence Layer, Persistent Agent, Self-correction, Signal Tracing, Stateful System, Supabase
    The google logo   www.roryteehan.com a day ago
248.  HN Show HN: Extract (financial) data from emails with local LLM
Dwata is an early-stage software tool designed to locally extract financial information from emails using local Large Language Models (LLMs), ensuring user privacy by avoiding cloud services. It connects with Gmail or IMAP accounts to download and store emails on the user's machine via SQLite, running efficiently on devices such as a Mac Mini M4 16GB. The tool leverages models like Ministral 3:3b through Ollama to create extraction templates based on email clusters from similar senders, aiming to enhance its capabilities by integrating various local APIs for diverse data types, including vendors and events. Users can manage and utilize these financial templates to automatically extract transaction details from emails. Dwata supports multiple LLMs, such as Ollama, OpenAI's GPT-4o Nano, or Google Gemini, allowing flexibility in switching between them within its settings. Developed with a robust tech stack that includes Rust for the backend, Actix-web for server operations, SQLite for database management, and SolidJS with DaisyUI for frontend design, dwata emphasizes privacy-focused financial data handling. Distributed under GPL v3 license, it is crafted by Sumit from India, who promotes coding education within his digital nomad community. Keywords: #phi4, Actix-web, DaisyUI, Emails, GPL v3, GitHub, LLM, Linux, Ministral, OAuth, Ollama, Rust, SQLite, SolidJS, Windows, digital nomad, extraction, financial data, macOS, privacy, templates, transactions
    The google logo   github.com a day ago
273.  HN Experimental Ollama Reserach project for small LLMs
The Infinibay project is a pioneering multi-agent swarm system designed to support autonomous research and software development using small Language Learning Models (LLMs) with less than 14 billion parameters, all on consumer-grade hardware via a Python-based backend and Node.js frontend. Utilizing an event-driven architecture, it assigns distinct roles such as planning, researching, coding, and reviewing to various agents within the system. This setup supports GPU inference for local models requiring at least 16GB of RAM and 12GB VRAM. Setup involves cloning a repository, configuring environment variables with prefixes like `INFINIBAY_`, and running a start script that installs dependencies, initializes databases, and launches backend and frontend servers. Users have the option to sandbox agents using Podman or Docker for isolated operations. The system has been tested with models including qwen3.5, gpt-oss, glm-4.7-flash, and ministral-3, which demonstrate commendable performance in speed, tool integration, and orchestration capabilities. It allows connections to APIs from providers such as Gemini, OpenAI, and Anthropic, though users must be cognizant of high token usage due to the detailed prompts required for smaller models. Despite its innovative approach, Infinibay faces issues like a non-functional Stop button in the UI and occasional redundant tool executions. As an early prototype, it invites community contributions including bug reports, feedback on agent behavior, and suggestions for improvement, with further details available in the project's LICENSE.md file. Keywords: #phi4, API, Agents, Autonomous, Bugs, Collaboration, Configuration, Containers, Docker, Event-driven, Experimental, Feedback, GPU, Infinibay, License, Models, Multi-agent, Nodejs, Ollama, Orchestration, Podman, Prototype, Prototyping, Python, Research, Sandbox, Small LLMs, Software Development, Swarm System
    The google logo   github.com a day ago
   https://github.com/Infinibay/researcher   a day ago
389.  HN Cliniclaw: AI-native HIS attempt with polict-gated clinical agents
Cliniclaw presents an AI-native Health Information System (HIS) designed to enhance clinical workflows through automated processes such as triage, order management, lab review, pharmacy tasks, and documentation. It leverages AI agents that operate under a trust layer named VERITAS, which ensures all actions undergo policy evaluations using the Open Policy Agent's Rego language for compliance. This system stores data in FHIR R4 format to maintain standardization while avoiding proprietary structures. The core design principles of Cliniclaw emphasize security and accountability through default denials of agent actions unless policies explicitly approve them. Human oversight is mandated by policy frameworks instead of relying on user interface conventions, ensuring a robust governance model. Additionally, the system employs cryptographic audit trails for enhanced traceability. It supports various language model backends like Claude API, Ollama, or mock setups, providing flexibility in integration. Cliniclaw's technology stack comprises Rust, axum 0.7, tokio, regorus (Rego), sqlx, reqwest, and Next.js 15, enabling it to address limitations found in conventional systems such as Epic by incorporating AI-driven solutions where traditional infrastructures are inadequate. A demonstration of the system can be accessed via a provided link, and further details about its policy enforcement layer, VERITAS, are available on GitHub. Keywords: #phi4, AI agents, Claude API, Epic, FHIR R4, Nextjs, OPA Rego, Ollama, Rust, SHA-256, VERITAS, axum, clinical encounters, cryptographic audit, documentation, lab review, orders, pharmacy, policy-gated, regorus, sqlx, tokio, triage, trust governance
    The google logo   news.ycombinator.com a day ago
535.  HN Giving local LLMs read-only institutional memory before task execution
To mitigate avoidable errors in a local language model (LLM) for code generation and execution, the author improved their three-tier agentic framework by integrating stateful context into task delegation processes. This enhancement involves implementing an enrichment pipeline prior to each call to the local LLM (Qwen2.5-Coder 32B). The pipeline extracts relevant data from databases such as Qdrant, Postgres, and Neo4j—encompassing past operations, ongoing mandates, and pending tasks—and infuses this "institutional memory" into the system prompt in a read-only manner. Incorporating this contextual information helps prevent repetitive errors, including suggesting previously unsuccessful methods or neglecting current project contexts. The approach involves setting constraints to ensure the local model only uses but does not alter data for task execution, effectively reducing issues like invalid RAID command loops. However, an ongoing challenge is managing potential context window pollution as execution memory accumulates over time. Currently, semantic searches with specific filtering parameters are employed, while further insights into sustainable long-term strategies are being explored. The system stack comprises Qdrant, Postgres, Neo4j, and Ollama. Keywords: #phi4, Neo4j, Ollama, Postgres, Qdrant, Qwen25-Coder, RAID commands, Three-tier agentic system, asynciogather, cloud LLM, code generation, enrichment pipeline, execution memory, hardware-specific mistake, institutional memory, local model, read-only boundary, score_threshold, semantic search, stateless delegation
    The google logo   news.ycombinator.com 2 days ago
548.  HN Show HN: Whichllm – Find and run the best local LLM for your hardware
WhichLLM is a command-line utility designed to facilitate the selection and execution of the most suitable local Large Language Models (LLMs) based on users' hardware specifications. The tool automatically identifies key system components such as GPUs, CPUs, and RAM configurations across various platforms including NVIDIA, AMD, Apple Silicon, or CPU-only systems. It ranks models available on HuggingFace according to criteria like VRAM compatibility, processing speed, and benchmark performance. This ranking allows WhichLLM to streamline the model running process through a single command execution without requiring manual installations. Additionally, it provides Python code snippets for easy implementation of selected models and outputs results in JSON format for seamless integration into other applications. The software offers functionalities such as simulating different GPU environments or planning hardware upgrades necessary for running specific models, enhancing its utility for users with varying computing resources. Commands like `whichllm run` automatically identify the optimal model for a system's specifications and initiate a chat session, while also allowing filtering based on use cases including general tasks, coding, vision processing, or mathematical computations. Integration with other tools such as Ollama is possible to facilitate direct execution of top-ranked models. Installation options include pipx, Homebrew, or pip, making it accessible for users across different systems. The tool's architecture consists of modules dedicated to hardware detection, model retrieval and ranking, performance estimation, and output presentation. Contributions to the project are encouraged, as it is open-source under the MIT license. It supports Python 3.11+ and includes native GPU detection specifically for NVIDIA devices, ensuring broad compatibility and functionality across diverse computing environments. Keywords: #phi4, AMD, Apple Silicon, CPU, Chatbot Arena ELO, GGUF, GPU, HuggingFace, JSON output, LLM, NVIDIA, Ollama, Open LLM Leaderboard, Python snippet, RAM, Typer CLI, VRAM estimation, benchmark, cache, contributions, development, hardware detection, inference speed, installation, integration, model compatibility, model formats, performance estimation, quantization, ranking, scoring engine, shell alias
    The google logo   github.com 2 days ago
568.  HN Show HN: commitgen-cc – Generate Conventional Commit message locally with Ollama
Commitgen-cc is a tool designed to automate the generation of Conventional Commit messages by analyzing staged Git changes, leveraging an Ollama model running locally to propose commit messages. Users can either accept these suggestions or engage in further customization such as editing, regenerating, or canceling them. The primary workflow involves staging files followed by running commitgen-cc to examine the generated message. The tool's key features include a local integration with an Ollama instance and various configurable options to tailor its behavior according to user preferences, such as model choice, host URL settings, and message constraints. It offers modes like dry-run for testing purposes and supports JSON outputs that facilitate integration into Continuous Integration (CI) systems. Additionally, it remembers accepted messages to refine future suggestions. Installation is straightforward with global deployment via `npm install -g commitgen-cc` or one-time execution using `npx commitgen-cc`. Advanced users can customize models and hosts or enforce specific commit structures through command options and environment variables for consistent local defaults. For team integration, commitgen-cc supports repository-level configurations through a `.commitgen.json` file and provides hooks to enforce policies such as ticket referencing or scope specification. It includes functionalities like `install-hook`, `uninstall-hook`, and `lint-message` to facilitate seamless workflow integration and message validation within CI systems. The tool is well-suited for Continuous Integration environments, offering JSON outputs that can be incorporated into GitHub Actions for automated commit title or pull request description validation. Furthermore, release management is streamlined using GitHub Actions workflows that encompass checks and secure publishing to npm through trusted publishing mechanisms based on predefined criteria. Overall, commitgen-cc enhances the creation of Conventional Commit messages with its robust customization options, team integration capabilities, and seamless CI/CD pipeline support, making it a valuable tool for modern software development practices. Keywords: #phi4, CI, Conventional Commit, GitHub Actions, JSON, Nodejs, Ollama, commitgen-cc, environment variables, git, hooks, lint-message, npm version, repo config
    The google logo   github.com 2 days ago
600.  HN Show HN: Bayesian intelligence – geopolitical predictions from live public data
The text introduces "Bayesian intelligence," a local-first analytical tool designed for making geopolitical predictions using live public data sources such as GDELT, Google News, Wikipedia, and private web searches. Utilizing Bayesian mathematics, it delivers probabilistic assessments backed by traceable chains of evidence, with source reliability weighting (e.g., World Bank at 0.93, state media at 0.25) allowing for dynamic probability adjustments based on new information. The tool offers five demo assessments focused on the Russia/Ukraine situation and includes a comprehensive 1010-node knowledge graph. It enables users to update data locally using "Ingest Now" via Docker Compose accessible at `http://localhost:8888`, eliminating the need for cloud services or accounts. Additionally, it supports running Ollama locally for enhanced AI-assisted analysis, including summarization and hypothetical scenario exploration. The tool is open-source and available on GitHub under the repository [intel-analyst](https://github.com/dx111ge/intel-analyst). Keywords: #phi4, AI-assisted analysis, Bayesian intelligence, Bayesian math, Docker compose, GDELT, GitHub, Google News, Ollama, Wikipedia, evidence chains, geopolitical predictions, live public data, local-first tool, private web search, probabilistic assessments, reliability tier
    The google logo   news.ycombinator.com 2 days ago
763.  HN Ask HN: How are you handling persistent memory across local Ollama sessions
The author explores the difficulties encountered while maintaining context across local Ollama AI tool sessions, where each session begins without prior knowledge, leading to inefficiencies when handled manually. To address this, a proxy solution was developed that stores and injects recent interactions at the start of new sessions, though confidence in its architecture is limited due to the author's non-computer science background. A significant challenge remains with scoping—preventing project contexts from mixing during simultaneous work on multiple projects, currently managed through separate directories but perceived as a temporary fix rather than a robust solution. The author seeks advice on more effective methods for persistent memory and clean scoping, inquiring about potential applications of vector databases, plain files, or MCP-based systems to improve this process. Keywords: #phi4, AI tools, MCP based, Ollama sessions, Persistent memory, context retention, local storage, project separation, proxy solution, retrieval, session scoping, stateless workflow, vector DB
    The google logo   news.ycombinator.com 3 days ago
830.  HN Show HN: Herd – Session-affine process pool for Go
Herd is a session-affine process pool library designed for Go that efficiently manages OS subprocesses while ensuring strict session affinity in routing HTTP traffic, so each session ID consistently maps to the same subprocess. This capability allows stateful binaries, such as headless browsers or language models, to operate as multi-tenant services without requiring complex coordination layers. Herd's key features include guaranteed session-to-worker routing, auto-scaling of workers based on demand, and eviction of idle workers using TTL (Time-To-Live). Additionally, it offers health monitoring for automatic replacement of failed processes and protects against simultaneous worker spawns through singleflight acquisition. The library supports various client types with its generic pool mechanism and incorporates a built-in reverse proxy to manage session lifecycles. Installation is simplified via `go get github.com/hackstrix/herd`, and documentation provides examples like transforming Ollama serve into a multi-tenant language model gateway, ensuring dedicated processes for each user, enhancing resource management. Herd's architecture centers around core interfaces such as Worker[C], WorkerFactory[C], and Pool[C], which manage subprocess instances, spawn new workers, and route sessions respectively. Configuration options include auto-scaling bounds, idle TTL settings, polling intervals for health checks, and custom crash handlers. The library is MIT licensed, encouraging community contributions and reviews. Keywords: #phi4, Auto-Scaling, Configuration Options, Go, HTTP Traffic, Health Monitoring, Herd, License, Multi-Agent Gateway, Ollama, Pool Router, Process Pool, Reverse Proxy, Session Affinity, Singleflight Acquisition, Subprocesses, TTL Eviction, Worker Factory, Workers
    The google logo   github.com 3 days ago
970.  HN FASTEST LLM decode engine on Apple Silicon. 658 tok/s on M4-Max,beats MLX by 19%
MetalRT has emerged as the leading large language model (LLM) decode engine on Apple Silicon, particularly excelling on the M4 Max chip with a remarkable speed of 658 tokens per second. This performance surpasses the MLX framework by 19% and is notably faster than alternative engines like uzu, llama.cpp, and Ollama. The evaluation involved four quantized models—Qwen3-0.6B, Qwen3-4B, Llama-3.2-3B, and LFM2.5-1.2B—operating on an Apple M4 Max with 64 GB of RAM under macOS 26.3. MetalRT achieved superior performance in three out of four models tested, demonstrating a speed increase ranging from 1.10x to 2.40x over mlx-lm and llama.cpp respectively. It recorded its fastest response at 6.6 milliseconds for the first token of the Qwen3-0.6B model. Although uzu exhibited superior performance on Llama-3.2-3B, MetalRT consistently maintained higher decode speeds across models, positioning it as optimal for fast-response applications like chat interfaces and voice systems. The benchmark ensured fairness by using identical model files for MetalRT and mlx-lm; however, llama.cpp and Ollama used GGUF files with additional REST API overhead. Despite these differences, the output quality remained consistent across all engines, highlighting that performance variations were purely in terms of speed. Keywords: #phi4, 4-bit quantized, Apple Silicon, LLM, M4 Max, MLX, MetalRT, Ollama, REST API, benchmarking, chat apps, decode engine, inference framework, llamacpp, macOS, privacy-first apps, speedup, throughput, time-to-first-token, tokens per second
    The google logo   www.runanywhere.ai 4 days ago
1578.  HN Show HN: Teaching Tokens: Implementing Private, Lightweight AI in the Classroom
"Show HN: Teaching Tokens" presents an innovative app designed for classroom use, aimed at facilitating the teaching of AI fundamentals through private, lightweight AI applications. The app streamlines the educational process by enabling educators to install an Ollama Docker container, pull a large language model with 1 billion parameters, and initiate a web-based chat interface for interactive learning experiences. This setup allows for one-click deployment of various other models, enhancing flexibility in teaching diverse AI concepts. Additionally, a lesson plan is provided on GitHub specifically tailored for educators using Kali Linux, ensuring structured guidance. The overarching goal of this app is to democratize AI education by making it more accessible and engaging through interactive and manageable technological tools. Keywords: #phi4, 1B Parameter model, App, Chat, Classroom, Deploy, Deploy models, Docker, GitHub, Image, Image view Keywords: Teaching Tokens, Interface, Kali, LLM, Lesson, Lesson plan, Model, Models, Ollama, Ollama Docker Container, One-click, One-click deploy, Parameters, Plan, Private AI, Script, Setup script, Teaching Tokens, View, WebUI, WebUI chat interface
    The google logo   medium.com 7 days ago
1595.  HN Show HN: AuraText – Like Grammarly for AI prompts, works in every Windows app
AuraText is a free, floating overlay application designed for Windows to enhance AI prompt optimization across various platforms such as Notion, VS Code, Slack, and Word. It refines vague prompts using established frameworks like RISEN, COSTAR, and RTF, significantly improving the quality of AI-generated outputs. The app includes an AI router that intelligently selects the most appropriate model for different tasks—Claude for analytical purposes, GPT-4 for creative tasks, and Gemini for research-related activities. Users also have the flexibility to integrate their own API keys from a range of providers, including local Ollama services. Developed independently over four months by a solo developer, AuraText has already achieved significant traction with over 1,000 downloads during its beta phase. The app is poised to introduce several key features, such as a Trust Layer for verifying AI outputs, a Skill Dashboard to monitor and enhance prompt quality, and a Learning Mode designed to improve users' interaction skills with AI tools. Its universal integration capability on Windows facilitates smooth transitions between applications without needing the Alt-Tab function, further supported by Smart Cursor Lock for efficient text insertion. These features collectively position AuraText as an innovative tool in optimizing AI interactions across different work environments. Keywords: #phi4, AI models, AI prompts, API keys, AuraText, COSTAR, Learning Mode, Ollama, RISEN, RTF, Skill Dashboard, Smart Cursor Lock, Trust Layer, Universal integration, Windows app, overlay
    The google logo   auratxt.com 7 days ago
1826.  HN Show HN: I built an AI data analyst that never sees your data
QueryVeil is an innovative AI-powered data analysis tool designed to function entirely within the browser, ensuring user data privacy by leveraging schema information—such as column names and types—instead of actual data. This approach facilitates generating SQL queries using DuckDB WebAssembly locally, thus avoiding the transfer of sensitive data to external servers. The system comprises three main layers: a local data engine, schema extraction, and AI-driven query generation that can operate both on the cloud or locally. The development of QueryVeil was driven by the author's experience as a data analyst, where rapid querying often clashed with data privacy concerns. While tools like ChatGPT accelerate analysis, they pose privacy risks due to their reliance on sending data to external servers. By focusing solely on schema information, QueryVeil offers a secure and efficient solution for data analysis. The architecture of QueryVeil involves extracting metadata from files without uploading them, allowing AI models—either local or cloud-based—to generate SQL queries that are processed within the browser. The tool incorporates enhancements such as handling complex queries via a LangGraph agent for multi-step analysis, managing performance limits with clear error messaging, and enabling verifiability of data claims through browser DevTools. For users prioritizing stringent privacy controls, QueryVeil provides local AI options like WebLLM and Ollama to keep the entire process isolated. The tool supports various file formats including CSVs, Excel, Parquet, and JSON files, with plans to expand its capabilities to connect with remote databases while adhering to schema-only analysis principles. Ultimately, QueryVeil aims to harmonize speed and safety in data analysis tools, empowering users to verify privacy claims through browser tools. Its flexible architecture allows for seamless switching between local and cloud AI resources, ensuring both efficiency and security in data handling. Keywords: #phi4, AI data analyst, DuckDB WebAssembly, LangGraph agent, Ollama, SQL generation, WebLLM, browser-based, cloud AI, local processing, multi-step queries, privacy, schema analysis
    The google logo   www.queryveil.com 8 days ago
   https://app.queryveil.com/demo   8 days ago
2213.  HN Show HN: Timber – Ollama for classical ML models, 336x faster than Python
Timber is a specialized tool designed to enhance the performance of classical machine learning models during inference, significantly increasing prediction speed by up to 336 times compared to Python-based XGBoost single-sample predictions. It achieves this efficiency by compiling models into native C binaries and serving them through a local HTTP API, thereby eliminating the need for a Python runtime during inference and achieving sub-microsecond latency. Timber is particularly suited for teams that require rapid, predictable, and portable model inference such as those in fraud/risk detection, edge/IoT deployments, regulated industries needing deterministic artifacts, and platform/infrastructure teams looking to minimize Python overhead through native binaries. The tool supports models from various frameworks, including XGBoost, LightGBM, scikit-learn, CatBoost, and ONNX. It offers a streamlined setup process with a simple load-and-serve workflow and a minimalistic API for model serving and health checks. Users can quickly get started by installing the compiler via pip, loading supported models using Timber's command-line interface, and serving them locally to make prediction requests. Timber supports multiple formats: JSON and text for XGBoost and LightGBM, pickle format for scikit-learn, ONNX (ML opset TreeEnsemble) for tree ensemble operators, and JSON exports for CatBoost. Benchmarks conducted on an Apple M2 Pro with 16 GB RAM using the breast_cancer dataset from sklearn demonstrated Timber's superior performance in in-process latency when compared to Python XGBoost, excluding network round-trip time. However, Timber does have certain limitations; ONNX support is confined to tree ensemble operators, CatBoost requires JSON exports, and scikit-learn parsing may struggle with uncommon custom estimators. The development roadmap for Timber includes expanding framework compatibility, supporting a broader range of ONNX operators, enhancing embedded deployment profiles, providing richer benchmarks, and improving tools for regulatory compliance. The project encourages community contributions with guidelines available in its repository and operates under an Apache-2.0 license. For those interested in more detailed insights into Timber's methodology and applications, a technical paper is provided as further reading. Keywords: #phi4, ARM Cortex-M, Apache-20 license, Apache-20 licenseComma-separated List: Timber, Apache-20 licenseExtracted Keywords: Timber, Apache-20 licenseFinal Keywords: Timber, Apache-20 licenseKeywords: Timber, CatBoost, HTTP API, LightGBM, MISRA-C, ML models, ONNX, Ollama, Python runtime, RISC-V, Timber, XGBoost, audit trails, benchmarks, deterministic artifacts, edge/IoT, inference, latency, microsecond latency, model-serving, native C, scikit-learn
    The google logo   github.com 9 days ago
   https://gist.github.com/msteiner-google/5f03534b0df58d3   9 days ago
2241.  HN Show HN: A local AI news aggregator built with Vue 3, FastAPI, and Ollama
The article presents a newly developed local AI news aggregator created with technologies including Vue 3, FastAPI, and Ollama. The developers have extended an invitation to users for feedback, highlighting the significance of user input in refining and enhancing the application's development process. To foster communication and gather insights or suggestions effectively, they are also seeking contact details via email from interested parties. This approach underscores their commitment to evolving the tool based on community engagement and constructive criticism. Keywords: #phi4, AI, FastAPI, Ollama, Show HN, Vue 3, built, contact, contact Keywords: Show HN, email, email address, feedback, input, local, news aggregator
    The google logo   github.com 9 days ago
2422.  HN Show HN: Chatlite – simple Ollama desktop chat app under 5 MB
Chatlite is a streamlined desktop chat application designed to provide simplicity and efficiency with minimal resource consumption. Developed as an alternative to more complex or web-dependent interfaces, it facilitates seamless interaction with models through its lightweight design. Key features include its compact size of under 5 MB and low memory usage, achieved by using Tauri for native desktop functionality. It offers secure local encrypted chats with password protection while prioritizing a keyboard-first user experience to enhance usability. Available on GitHub, Chatlite invites users to provide feedback aimed at improving integration with local Large Language Model workflows. For further contact or suggestions, the developer has made an email address available, though it is omitted here for privacy and security reasons. Keywords: #phi4, Chatlite, GitHub repository, LLM workflows, Ollama, Tauri, desktop app, encrypted chats, feedback, keyboard-first UX, low memory footprint, native desktop, password lock, small app size
    The google logo   github.com 11 days ago
2462.  HN Show HN: Externalizing Developers' Intuition as Code
Dev Sentinel is an innovative tool designed to enhance engineers' problem-solving abilities by transforming coding challenges into structured knowledge. It operates during developers' coding sessions with Claude Code, identifying moments of difficulty without modifying the prompts. This process generates memories from raw failures that are refined and validated, helping users connect these experiences across different contexts to uncover root causes of issues. To utilize Dev Sentinel, one must clone its repository and install dependencies using npm, along with setting up Ollama locally or AWS Bedrock for optimal functionality. The setup requires initializing the tool within a project directory to establish necessary hooks and configuration files. Once installed, users can access a local web dashboard that displays captured experiences and emerging patterns. Struggles identified during coding sessions are reviewed and confirmed, subsequently summarized into stored knowledge, which aids in refining problem-solving skills over time. Dev Sentinel provides extensive documentation through markdown files covering commands, settings, and usage examples to assist users in maximizing its benefits. The tool is open-source under the MIT license, while its default models (Qwen3) operate under the Apache 2.0 license provided by Alibaba Cloud. Keywords: #phi4, AWS Bedrock, Apache 20 License, Claude Code, Dev Sentinel, Engineering Intuition, Experience Generation, Git Clone, Knowledge Reuse, Ollama, Pattern Connection, Qwen3 Models, Struggle Equity, npm Install
    The google logo   github.com 11 days ago
2512.  HN Show HN: H-CLI – Manage network infrastructure with natural language
H-CLI is an advanced Telegram bot created to facilitate network infrastructure management through natural language commands, developed by a network engineer with expertise in parallel SSH tooling across various vendors. The bot integrates artificial intelligence models like Claude Code or self-hosted alternatives to interpret and execute tasks such as discovering CLOS fabrics and deploying EVE-NG labs via plain English instructions. Key features include the ability to perform parallel REST API calls, automate lab deployments with EVE-NG, render Grafana dashboards directly through Telegram, and possess teachable skills that allow it to learn from user interactions. It employs memory systems that utilize chunk-based conversations and vector memory for retaining long-term knowledge. The bot prioritizes safety by employing a layered security model akin to Asimov's Laws of Robotics. This involves using two distinct AI models: one responsible for executing commands and another acting as an independent judge to ensure command safety. H-CLI’s infrastructure is supported by Docker Compose with nine containers, incorporates pattern denylists, network isolation, non-root privileges, and HMAC-signed results to enhance security. For deployment, users need to configure the system similarly to setting up a monitoring tool, employing read-only credentials and ensuring restricted access. Additional functionalities include vector memory for semantic search of past interactions, performance metrics through a monitor stack, backup options, and data export capabilities essential for training models. As an open-source project under the MIT license, H-CLI is tailored for engineers seeking adaptable tools that improve with interaction over time. Keywords: #phi4, AI agent teams, AI brain, Asimov firewall, Claude Code, Docker Compose, Docker networks, EVE-NG lab, Grafana dashboard, HMAC-signed results, NetBox, Network infrastructure, Ollama, Qdrant database, REST API calls, Redis scaling, Telegram bot, TimescaleDB, audit trail, backup & sync, chunk-based conversation memory, horizontal scaling, log4AI logger, natural language, network automation, non-root containers, parallel SSH, pattern denylist, security hardening, semantic analysis, shell command logger, vLLM, vector memory
    The google logo   github.com 11 days ago
2576.  HN Externalizing Developers' Intuition as Code
Dev Sentinel is a sophisticated tool engineered to externalize developers' intuition into structured knowledge through monitoring Claude Code sessions. It identifies moments of developer frustration and transforms these challenges into reusable lessons without altering user inputs. The core functionality involves detecting when struggles occur, refining these incidents into concrete learning points, validating them to prevent repetition, and linking patterns across various scenarios to uncover root causes, thereby enhancing engineering intuition cumulatively. To set up Dev Sentinel, users must clone its GitHub repository and navigate into the project directory where they can utilize npm commands for installation. The tool relies on Ollama for model management and offers optional integration with AWS Bedrock to boost performance capabilities. Users need to initialize hooks within their project directories to capture specific session events effectively. Once operational, Dev Sentinel allows users to review drafts of captured struggles through a local dashboard or command-line tools, enabling them to confirm and store these experiences as structured lessons. The tool is distributed under the MIT license, ensuring open-source accessibility, while its default models are governed by Alibaba Cloud's Apache 2.0 license. Comprehensive setup instructions and configuration details can be found in documentation files such as COMMANDS.md, SETTINGS.md, and EXAMPLE.md, guiding users through the entire process efficiently. Keywords: #phi4, AWS Bedrock, Code, Commands, Dashboard, Dev Sentinel, Drafts, Engineering Intuition, Experience Generation, Externalizing Intuition, Frustration Detection, Git Clone, Hooks, License, Models, Ollama, Review, Settings, Structured Memory, Struggle Equity, npm Install
    The google logo   github.com 11 days ago
2910.  HN I built a 151k-node GraphRAG swarm that autonomously invents SDG solutions
The "PROMETHEUS AGI" project is an innovative initiative aimed at advancing beyond conventional language model applications by employing a sophisticated autonomous 151k-node GraphRAG swarm. Its primary objective is to facilitate cross-domain reasoning in order to propose novel solutions for the United Nations Sustainable Development Goals (SDGs). The project harnesses Neo4j Aura for organizing data and incorporates patent information through Google BigQuery and OpenAlex API, while utilizing Ollama's Llama 3 for entity extraction and Claude 3.5 for comprehensive reasoning processes. A key feature of this system is its ability to identify "Missing Links" by mapping existing problems with available technologies across various domains, subsequently generating concept blueprints for innovative solutions that are not yet patented. One such example is Project HYDRA, a zero-power water purifier. To date, over 261 blueprints have been created as part of this initiative. The project seeks engagement from domain experts to validate these AI-generated ideas and assist in developing prototypes. It also aims to secure funding to expand its graph database beyond one million nodes. Feedback is sought on various aspects such as architecture, the Neo4j schema, and the multi-agent approach employed by PROMETHEUS AGI. The user interface comprises a Streamlit-based digital twin dashboard and a React/Vite landing page, which facilitate interaction with the project's outputs. Links to explore these resources are provided: [Project Prometheus Dashboard](https://project-prometheus-5mqgfvovduuufpp2hypxqo.streamlit.app/) and [PROMETHEUS AGI Landing Page](https://prometheus-agi.tech). Keywords: #phi4, Claude 35, Google BigQuery, GraphRAG, LLM/RAG, Missing Links, Neo4j Aura, Ollama, OpenAlex API, PROMETHEUS AGI, Project HYDRA, React/Vite, Streamlit, UN SDGs, biofouling, concept blueprints, cross-domain reasoning, digital twin dashboard, domain experts, materials science, multi-agent approach, nanobiology
    The google logo   news.ycombinator.com 13 days ago
3146.  HN An AI agent on an ESP32 that can automate sensors, relais, speak NATS, Telegram
The AI agent described is an innovative product designed for the ESP32 microcontroller, providing comprehensive automation at a cost below $5 without requiring external systems like Raspberry Pi or Home Assistant. This standalone solution emphasizes persistent local automation through a rule engine that manages sensor-triggered actions and complex sequences. It supports multi-channel control via interfaces such as Telegram, USB serial, and NATS, enhancing its versatility. The agent features an advanced tool loop allowing the AI to engage in iterative reasoning and execution of up to 20 tools, adapting from outcomes without depending on large language models or cloud services. User preferences are retained through persistent memory stored in flash storage, ensuring continuity after reboots. The device operates independently of network connections by supporting local execution frameworks like OpenRouter, Ollama, or llama.cpp, negating the need for internet access or API keys. Additional functionalities include a web-based configuration interface accessible from any browser, allowing users to adjust prompts, memory settings, and configurations without needing to reflash the device. It also offers serial bridge capabilities, enabling seamless interaction with other serial devices such as Arduinos and various sensors via UART, enhancing its utility in diverse automation scenarios. Keywords: #phi4, AI agent, ESP32, NATS, Ollama, OpenRouter, Telegram, UART, agentic tool loop, automation, bare metal, edge-triggered, llamacpp, multi-channel control, persistent memory, relais, rule engine, sensors, serial bridge, web config
    The google logo   wireclaw.io 14 days ago
3214.  HN BrAIn: Persistent, Human-Inspired Memory for LLM Agents
brAIn is an innovative tool developed to enhance LLM-based agents by providing a persistent, human-inspired memory system, addressing the limitations of stateless agents that restart each interaction without context. This tool employs a structured memory model inspired by neuroscience, consisting of five types: working, episodic, semantic, procedural, and contact memories. These are all stored in a portable `.brain` file, allowing for comprehensive data retention and recall. brAIn's key features include an automatic memory consolidation process akin to a "sleep cycle," where memories undergo decay, promotion, or emergence, enhancing the agent's ability to manage information efficiently over time. Additionally, it offers semantic similarity search capabilities through embeddings, facilitating better understanding and retrieval of related concepts. Compatibility with various AI editors such as Cursor, Claude Code, Codex, and Gemini CLI broadens its applicability across different platforms. For optimal functionality, brAIn requires certain prerequisites: CGO for the SQLite driver and an LLM connection via Ollama for specific operations, though read-only tasks do not necessitate Ollama. The tool is installed using `make install` and operates from a singular data directory located in the user's home, housing essential files like the primary brain store (`agent.brain`), configuration settings, logs, and backup records. A background daemon ensures continuous memory processing by handling tasks such as automatic consolidation and updates without user intervention. Comprehensive documentation is available to guide users through configuring brAIn, managing the daemon, and executing tests or example scenarios, ensuring effective utilization of this advanced memory-enhancing tool. Keywords: #phi4, AI editor, LLM agents, Ollama, SQLite, automatic processing, backups, brAIn, brain-daemon, configuration, consolidation, contact memory, daemon, embeddings, emergent skills, environment variables, episodic memory, integration, logs, persistent memory, procedural memory, semantic memory, sleep cycle, working memory
    The google logo   github.com 14 days ago