Scraper
Spider

About
Blog
@dbaman@fosstodon.org

Click ▶ to show/hide AI summary and keywords
Click The google logo

for Google search on keywords

2026-03-11 15:29

lm studio

lm studio stories from the last 14 days | Back to all stories

73. HN Show HN: Ory Lumen - faster, cheaper Claude Code with local semantic code search

Ory Lumen is designed as a local semantic search tool that enhances the performance and cost-efficiency of Claude Code, particularly in large codebases. By leveraging SQLite-vec for embedding models locally, it significantly reduces runtime by up to 53% and API costs by up to 39%, according to SWE-style benchmarks. This addresses Claude Code's limitations with exact text matching by enabling semantic search, which facilitates the quick location of relevant code snippets without scanning entire files. Lumen indexes a project upon its first run and only updates changed files subsequently, thereby speeding up re-indexing processes even for large projects. Benchmarks indicate consistent performance improvements across various programming languages like JavaScript and Rust, showcasing notable reductions in execution time and output tokens while maintaining quality. The tool operates as part of an MCP server alongside Claude Code and can be installed easily via the Ory Claude plugin marketplace. It supports multiple languages, including Go, JavaScript, PHP, Python, Ruby, Rust, TypeScript, and C++, ensuring all operations remain local to maintain data privacy and compatibility with air-gapped environments. Ory Lumen is part of a broader suite of open-source tools developed by Ory aimed at streamlining identity and access management processes without the need for custom code solutions. Keywords: #phi4, API costs, AST parser, C++, Claude Code, GitHub bugs, Go, JavaScript, LM Studio, MCP server, Ollama, Ory Hydra, Ory Keto, Ory Kratos, Ory Lumen, Ory Oathkeeper, Ory OathkeeperKeywords: Ory Lumen, PHP, Python, Ruby, Rust, SQLite-vec, SWE benchmarks, TypeScript, air-gapped environments, codebase indexing, embedding models, local embeddings, plugin marketplace, semantic search, tree-sitter grammars, vector search

lm studio

www.ory.com 6 hours ago

452. HN Show HN: Nikui – An LLM-Powered "Stench Guard" for Your CI/CD

Nikui is a cutting-edge tool that leverages Large Language Models (LLMs) to identify and prioritize technical debt within codebases by going beyond traditional linting methods. Inspired by the concept of analyzing code like a crime scene, Nikui focuses on detecting deeper architectural issues rather than superficial ones. Its core features include calculating a "Hotspot Score," which combines LLM-detected "stench" (code debt) with Git commit frequency ("churn") to pinpoint priority files for refactoring. The tool also performs semantic analysis to identify structural problems like SOLID violations and god objects, supporting various OpenAI backends. Additionally, Nikui offers a static security scan using Semgrep for security checks and best practices adherence while employing Simhash verified by LLMs for effective duplication detection with reduced false positives. It provides objective metrics on code complexity and file size using Flake8. The tool is designed to integrate seamlessly into CI/CD pipelines via GitHub Actions, allowing for efficient scanning of code changes through full scans, targeted analyses, and diff mode optimizations. Users can set up Nikui by installing necessary dependencies and configuring a `.nikui/config.json` file in the target repository, choosing from various LLM backends like OpenAI or Ollama for semantic analysis. Configuration options include setting exclusion patterns and stench weights to tailor the tool’s functionality. Contributions are encouraged to enhance detection engines, improve prompts, expand language support, and upgrade the user interface. Licensed under Apache 2.0, Nikui builds upon existing software forensics methodologies with the aim of streamlining technical debt management in development workflows, making it a valuable asset for modern software engineering teams seeking efficient codebase maintenance and improvement. Keywords: #phi4, Architectural Rot, CI/CD, CI/CD Optimization, Churn, Code Smells, Codebase Scan, Configuration, Flake8, Forensics Tool, GitHub Action, Hotspot Matrix, Hotspot Score, Interactive Report, LLM Semantic Reasoning, LLM-Powered, Nikui, OpenAI-Compatible Backend, PrioritizationKeywords: Nikui, Semantic Analysis, Simhash Algorithm, Software Forensics, Static Security Scan, Stench Guard, Structural Duplication, Technical Debt

lm studio

github.com 2 days ago

546. HN Show HN: Beta-Claw – I built an AI agent runtime that cuts token costs by 44%

Beta-Claw is an innovative AI agent runtime developed to significantly reduce token costs by 44% through the use of Token-Oriented Object Notation (TOON) rather than JSON, thus facilitating efficient serialization methods that save millions of tokens daily. Originally conceived for a competition, Beta-Claw effectively handles large-scale applications and incorporates key features like support for multiple AI providers such as Anthropic and OpenAI. It employs smart routing to choose the most cost-effective models and utilizes a multi-agent directed acyclic graph (DAG) framework that coordinates various tasks including planning, research, execution, memory management, and composition. Enhancing security, Beta-Claw features encrypted vaults using AES-256-GCM encryption, prompt injection defense mechanisms, and automatic redaction of personal identifiable information (PII). The system simplifies multi-agent workflows by allowing skills to be managed through SKILL.md files. It supports various platforms including Linux, macOS, and Windows via WSL2, with its open-source code available on GitHub. Developed using TypeScript along with Node.js or Bun for dependency management, Beta-Claw can be operated via a command-line interface (CLI) or HTTP interfaces and integrates seamlessly with chat channels like Telegram and Slack. Addressing common inefficiencies in AI runtimes such as provider lock-in, token waste, and complex workflows, Beta-Claw strives to be provider-agnostic, facilitating multi-provider routing without requiring application rewrites. Its user-friendly design is underscored by a CLI-first approach that offers customization possibilities. The project also includes a comprehensive benchmark suite for evaluating performance and allows easy configuration via TOON, making it a versatile tool in the AI runtime landscape. Keywords: #phi4, AI agent runtime, AI runtime, Beta-Claw, CLI-first, Linux/Mac/WSL2, OpenRouter, PII redaction, SQLite FTS5, TOON format, TypeScript, benchmark suite, complexity estimator, encrypted vault, guardrails, guardrails Comma-separated List: Beta-Claw, guardrails Extracted Keywords: Beta-Claw, guardrails Final Keywords: Beta-Claw, guardrails Keywords: Beta-Claw, guardrails Simple Keywords: Beta-Claw, hot-swappable skills, multi-agent DAG, multi-provider, multi-provider support, prompt defense, prompt injection defense, provider-agnostic, smart model routing, smart routing, token cost reduction, token reduction

lm studio

github.com 2 days ago

595. HN Show HN: Finsight – A Privacy First, AI Credit Card and Bank Statement Analyzer

Finsight is an AI-driven personal finance tool designed to analyze credit card and bank statements locally on users' devices, prioritizing privacy by avoiding cloud storage or user accounts. By processing uploaded PDFs, CSVs, or Excel files, it extracts transactions for categorization and analysis using a local Large Language Model (LLM). Its features include interactive dashboards providing spending insights, detecting recurring payments, and an inquiry chat function for financial data questions. Supporting LLMs like Gemma, Llama, Mistral, and Qwen via Ollama or LM Studio, Finsight ensures user privacy by running entirely offline after initial model downloads, with no internet connection required post-setup. Developed using Next.js, Tailwind CSS, Zustand, Chart.js, shadcn/ui, pdfjs-dist, and TypeScript, the app maintains data in the browser's localStorage, ensuring no personal information leaves the device. Installation requires setting up Node.js or Docker, selecting an LLM provider, and downloading a model based on user preferences for speed or accuracy. Designed exclusively for local operation, Finsight provides comprehensive financial insights while emphasizing privacy and security, inspired by projects like bank-statement-visualizer, and is released under the MIT license. Keywords: #phi4, AI, CORS, CSV, Chartjs, Docker, Excel, Finsight, Homebrew, LM Studio, MIT License, Nextjs, Node JS, Ollama, PDF, Tailwind CSS, TypeScript, Zustand, analyzer, bank statement, budget plan, categorization, chat, credit card, dashboard, debug logging, pdfjs-dist, personal finance, privacy, recurring payments, spending insights, transactions

lm studio

  github.com 2 days ago
   https://youtu.be/VGUWBQ5t5dc   2 days ago
   https://github.com/AJ/FinSight?utm_source=hackernews&am   2 days ago

677. HN Show HN: I over-engineered a home security camera that uses an LLM and talks

"Roz" is an innovative open-source home security system that leverages Python to function independently of cloud services or subscription models. Operating locally on a Raspberry Pi 4, it captures and processes webcam footage using OpenCV for motion detection while utilizing a separate PC with an RTX 3090 GPU to analyze scenes via the Qwen3.5 language model. The system identifies "meaningful changes" in video feeds compared to established baselines, subsequently announcing these events through Piper TTS-enabled text-to-speech audio alerts. Its architecture is designed for flexibility and customization, allowing users to adjust motion detection sensitivity and create personalized rules for change detection. Users can build Roz using a USB webcam and speakerphone on Linux-based systems, providing customizable hardware configurations. Installation of Roz requires setting up necessary dependencies and configuring the environment, with troubleshooting support available for audio and camera issues. The system is distributed under the GNU Affero General Public License v3.0, ensuring open access to its source code and allowing modifications while maintaining user freedom. Keywords: #phi4, ALSA audio, DIY project, GNU AGPL-30, GPU, Home security, LLM, LM Studio, OpenAI API, OpenCV, Piper TTS, Python, Qwen35, Raspberry Pi, TTS synthesis, USB speaker, USB webcam, audio troubleshooting, camera focus, configuration file, frame differencing, hardware enclosure, llamacpp, local hosting, local processing, meaningful change, motion detection, motion sensitivity, privacy-focused, text-to-speech, uv, vLLM, video feed, vision analysis, web server streaming

lm studio

github.com 3 days ago

706. HN Show HN: Atombot – atomic-lightweight AI assistant for local models and GPT‑5.4

Atombot is a lightweight, self-hosted AI assistant designed for ease of understanding and extension, offering core functionality in about 500 lines of code, making it simpler compared to larger frameworks like OpenClaw which require thousands to hundreds of thousands of lines. Its features include persistent memory with searchable logs, Telegram-based access control, one-time and recurring reminders, and a skills system that aligns with the OpenClaw SKILL.md format. Atombot supports multiple Large Language Model (LLM) providers, including those using OpenAI-compatible endpoints or Codex in CLI mode, and provides provider-first onboarding that automatically detects models from Ollama, LM Studio, or Codex to set up configurations seamlessly. Installation of Atombot can be done via source code for development purposes or through PyPI. Users can quickly start by initializing a workspace with the `atombot onboard` command, starting a Telegram gateway to interact with the AI assistant via chat, and using either Telegram or CLI for direct communication. Keywords: #phi4, AI, AI assistant, Atombot, CLI, Codex, GitHub, LLM, LLM provider, OpenClaw, PyPI, Telegram, development, gateway, installation, lightweight, onboarding, persistent memory, personal, project structure, project structure Keywords: Atombot, quick start, reminders, self-hosted, skills, skills system, workspace

lm studio

github.com 3 days ago

814. HN How to run Qwen 3.5 locally

The document offers an extensive guide on deploying Alibaba's Qwen3.5 language model family on local devices, covering a range of models from 0.8B to 397B-A17B. It details how users can run these models using tools like Llama.cpp or LM Studio and provides instructions tailored for different hardware setups. The models support a context length of up to 256K across 201 languages and feature hybrid reasoning capabilities, with options for toggling thinking modes. The guide highlights the use of Unsloth's advanced quantization technology, which enables state-of-the-art performance on lower-bit (3-bit to 8-bit) models optimized for tasks such as coding and long-context processing. Benchmark results show minimal accuracy loss with these optimizations, allowing large models to operate on devices with limited memory. Users can install and execute models via terminal commands and manage model preferences effectively. Additionally, the guide covers setting up thinking modes for different tasks by adjusting parameters like temperature settings and penalties, ensuring optimal performance. The benchmarks confirm that Qwen3.5 achieves high accuracy with reduced memory requirements, facilitating efficient deployment in both personal and production environments. Overall, this manual serves as a comprehensive resource for leveraging Alibaba's latest language models locally, balancing size and performance efficiently across various hardware platforms through optimized quantization techniques. Keywords: #phi4, Accuracy, Alibaba, Benchmarks, Context, Dynamic 4-bit, GGUF, Hardware, Hybrid Reasoning, Inference, KL Divergence, LLMs, LM Studio, Languages, Medium, Memory Footprint, Multimodal, Non-Thinking Mode, Quantization, Qwen35, Settings, Small, Thinking Mode, Tool Calling, Unsloth, llamacpp

lm studio

  unsloth.ai 3 days ago
   https://gist.github.com/danthedaniel/c1542c65469fb1caaf   3 days ago
   https://github.com/ollama/ollama/issues/14419   3 days ago
   https://github.com/ollama/ollama/issues/14503   3 days ago
   https://www.localscore.ai   3 days ago
   https://www.tommyjepsen.com/blog/run-llm-locally-for-co   3 days ago
   https://github.com/brainless/dwata   3 days ago
   https://github.com/girvo/girvent/   3 days ago
   https://pchalasani.github.io/claude-code-tools/integrat   3 days ago
   https://unsloth.ai/docs/models/qwen3.5/gguf-b   3 days ago
   https://www.siquick.com/blog/model-quantization-fine-tu   3 days ago
   https://fairwitness.bot/   3 days ago
   https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF   3 days ago
   https://github.com/daegwang/atombot   3 days ago

1027. HN Show HN: Hydra – Real-time ops dashboard for developers running AI agents

Hydra is a macOS desktop application crafted specifically for developers who manage multiple AI agents and local development servers, offering real-time operational insights without relying on cloud services or telemetry. Constructed using Electron, React, and TypeScript, it provides comprehensive visibility into system metrics such as CPU/memory usage by processes, port-to-process mappings, Git repository health, network bandwidth, and security posture. The application supports monitoring of eight AI agent types like Claude Code and Codex, integrating with LM Studio to facilitate local AI briefings without cloud API requirements. It features a robust dashboard consisting of 12 panels that cover workspace health, resource usage, git status, network monitoring, and security scans, among others. Hydra is equipped with auto-heal capabilities to address issues such as high CPU/memory utilization or missing processes/ports based on predefined rules. Additionally, it includes Claude Code usage tracking, which provides insights into token usage and cost estimates. The app focuses on local data management by storing information in SQLite and allows users to customize settings via a config file or .env file. Built with modern web technologies like Tailwind CSS for styling and Zustand for state management, Hydra's testing is supported by Vitest. Although currently available only on macOS, its framework supports future expansion to other platforms such as Linux and Windows. Hydra enhances developer productivity by centralizing the monitoring and management of AI agents and development environments. As an open-source project under the MIT license, it invites community contributions and improvements. Keywords: #phi4, AI agents, CPU/memory, Claude Code, Electron, Git health, GitHub, Hydra, LM Studio, React, SQLite, Tailwind, TypeScript, Vitest, Zustand, auto-heal engine, configuration, dashboard, git status, local LLM, macOS, network bandwidth, platform support, platform support Comma-Separated Keywords: Hydra, platform support Comma-Separated List: Hydra, platform support Extracted Keywords: Hydra, platform support Final Keywords: Hydra, platform support Final List: Hydra, platform support Hydra, platform support Keywords: Hydra, platform support Selected Keywords: Hydra, platform support Simplified Keywords: Hydra, port mapping, process monitoring, security posture, system tray, testing

lm studio

github.com 5 days ago

1035. HN Show HN: Rental Property Deal Analyzer – 20 metrics, deal scoring, AI analysis

The Rental Property Deal Analyzer is an open-source tool aimed at evaluating rental property investments by calculating key financial metrics such as Cash-on-Cash Return, Cap Rate, and Debt Service Coverage Ratio (DSCR). It provides a 14-point deal scorecard to assess these metrics, helping investors make informed decisions. The backend utilizes FastAPI to deliver data via HTML/CSS/JS without requiring additional frameworks or build steps. Users can project five-year total returns, incorporating cash flow, appreciation, debt paydown, and tax benefits, while also assessing the fit of various investment strategies. In addition to these features, the tool offers optional AI analysis through platforms like LM Studio, Ollama, or Anthropic Claude, with real-time response streaming. It employs data scraping techniques from Zillow using Playwright as a fallback option when necessary. The interface allows users to input details about property, loans, income, expenses, and reviews, generating detailed investment analyses that include monthly cash flow, comprehensive metrics, and five-year return projections with equity growth insights. Users have the flexibility to save, compare scenarios, and export results in PDF or HTML format, adhering to an MIT license. The tool's source code is available on GitHub, allowing users not only to utilize its features but also to contribute or customize it according to their needs. This combination of detailed financial analysis and user-friendly functionality makes the Rental Property Deal Analyzer a versatile resource for investors seeking to evaluate rental property opportunities effectively. Keywords: #phi4, AI Analysis, Break-Even Occupancy, Cap Rate, CapEx Reserve, Cash-on-Cash, DSCR, Deal Analyzer, FastAPI, GRM, HTML Export, Loan Details, Metrics, NOI, Operating Expenses, PDF Export, Playwright, Property Management, ROI, Rental Income, Rental Property, SSE, Strategy Fit, Total Return, Zillow Scraping

lm studio

rental-property-deal-analyzer.onrender.com 5 days ago

1053. HN Show HN: A local, multi-agent, customizable stack built for researchers

The article presents "Vers3Dynamics R.A.I.N. Lab," an innovative open-source research stack crafted using Rust and Python, aimed at facilitating reproducible experiments through voice conversations. Its primary goal is to offer a customizable, local platform that echoes the ethos of 20th-century Bell Labs, allowing researchers to fluidly transition from conceptual ideas to experimental artifacts without depending on opaque systems. Central to its functionality are two core components: ZeroClaw, a Rust-based agent runtime responsible for orchestration, tool management, and policy enforcement; and James Library, which provides Python workflows specifically tailored for acoustic physics and resonance research, enabling the study of non-linear wave interactions and bio-acoustic phenomena. Additionally, Vers3Dynamics employs Godot to create multi-agent visual interfaces, enhancing user interaction and understanding. Security is a key consideration within this platform, as it treats all external text inputs as untrusted by default. The setup process has been streamlined for ease of use, featuring pre-built binaries and scripts that facilitate rapid installation across Linux, macOS, and Windows platforms. Emphasizing reliability, the system includes repo integrity checks and efficient handling of gateway requests. Development tools such as Rust's cargo and Python's pip are utilized for testing and formatting purposes, ensuring a smooth development experience. Comprehensive documentation is provided under the MIT License to support user adoption and collaboration. Originally developed by Vers3Dynamics as a research and development tool, this platform has been made open-source to encourage wider collaboration within the research community. Keywords: #phi4, AI, CLI, Godot, James Library, MIT License, Python, R&D, Rust, Vers3Dynamics, ZeroClaw, acoustic physics, agents, benchmarks, execution engine, experiments, gateway, health check, memory system, orchestration, policy enforcement, reasoning, resonance, runtime, synthesis, virtual environment, visualization, voice conversations, workflows

lm studio

github.com 5 days ago

1142. HN World Monitor – AI-powered news aggregation

World Monitor is an AI-driven global intelligence platform that offers real-time news aggregation, geopolitical monitoring, and infrastructure tracking via a unified dashboard. It integrates over 435 curated feeds from more than 100 sources into categories including geopolitics, technology, finance, commodities, and positive news. The platform enhances situational awareness with interactive maps displaying up to 45 data layers such as conflicts, military bases, and trade routes. Key features include AI-generated geopolitical briefs, real-time updates with live video streams, and a comprehensive market radar providing financial insights. Supporting content in 21 languages, World Monitor is accessible through web-based platforms and native desktop applications for macOS, Windows, and Linux without any user costs, utilizing open-source technologies. The platform employs advanced AI models like Ollama and Groq to facilitate summarization, deduction, and threat classification, offering dual map engines with both 3D globes and flat maps. World Monitor provides API access for developers, prioritizing security through CORS origin allowlists and input sanitization. Community contributions are encouraged, with development guidelines, deployment details, and licensing information available under AGPL-3.0 in the project's repository. Users can explore insights via various subdomains tailored to general insights and specific domains such as tech, finance, commodities, and positive trends. For support or security issues, users have designated contact channels, acknowledging responsible vulnerability disclosures by researchers. Keywords: #phi4, AI summarization, AI-powered, Country Instability Index, desktop app, dual map engine, geopolitical monitoring, infrastructure tracking, multi-signal analysis, native-language support, news aggregation, open-source, real-time updates, threat classification

lm studio

github.com 5 days ago

1151. HN Show HN: Utter, a free local dictation and meeting notes app for Mac and iPhone

"Utter" is a free application available on Mac and iPhone designed to transform voice notes into clean, well-formatted text with a strong emphasis on privacy and local data handling. It offers rapid transcription services with sub-second accuracy and customizable post-processing to enhance clarity without any cost or cloud storage requirements. Key functionalities include the ability to create personalized shortcuts, adapt to various workflow modes, generate speaker-labeled transcripts from audio recordings, employ context-aware processing for more relevant text outputs, summarize links within notes, and utilize Markdown for note editing. The app supports complete local data retention while providing seamless synchronization through iCloud without necessitating an account setup. Designed with privacy-conscious users in mind, "Utter" facilitates a smooth transition between phone and desktop environments by converting rough voice recordings into polished text documents, addressing the demand for intuitive, secure dictation tools that handle audio files locally. Keywords: #phi4, AI chat, BYOK, LM Studio, Mac, Markdown editor, Ollama, Parakeet, Utter, audio/video file transcription, context-aware processing, dictation app, dictation keyboard, dictation keyboardKeywords: Utter, iCloud sync, iPhone, link summarization, local models, local workflows, meeting recording, no account registration, post-processing, privacy, shortcuts, speaker-labeled transcripts, transcription

lm studio

utter.to 5 days ago

1172. HN Show HN: Triplecheck – Review your code free with local LLMs

Triplecheck is an open-source AI-driven code review tool designed to facilitate thorough and cost-effective code reviews by utilizing local language models such as Qwen3-Coder or DeepSeek Coder, avoiding the expenses associated with API usage. It features a multi-pass review cycle that conducts up to five rounds of reviews from diverse perspectives, incorporating a voting mechanism to reduce false positives. Additionally, it supports both local and cloud hybrid models for efficient resource utilization, offering initial reviews locally while utilizing cloud models like Claude Opus for quality judgment. The tool integrates comprehensive testing automatically after each code fix attempt, ensuring that regressions are identified early in the process. It provides structured feedback on potential bugs, detailing aspects such as file location, line number, severity, and suggested fixes. Furthermore, Triplecheck allows users to customize its pipeline, enabling model configuration, behavior adjustments, and integration with static analysis tools. Currently, Triplecheck supports multiple programming languages including Python, Go, and Rust, and is effective in bug detection across extensive codebases. However, it lacks GitHub PR integration and incremental reviews, though these features are planned for future development. Compared to other AI code review tools like CodeRabbit and Sourcery, Triplecheck distinguishes itself by offering free local operations and a more robust multi-pass review engine that includes actual code fixes rather than mere suggestions. Looking ahead, Triplecheck's roadmap aims to enhance its capabilities through GitHub PR integration, support for incremental diff-only reviews, and the generation of PR summaries. Future enhancements include developing a VS Code extension, web report viewer, and expanding platform compatibility to encompass GitLab and Bitbucket. The tool is built using Python and Click CLI, with configuration options compatible with various OpenAI-compatible backends or local LLMs, positioning Triplecheck as a versatile option for developers seeking AI-enhanced code reviews without recurring costs. Keywords: #phi4, AI, CI test gate, CLI, GitHub, GitHub integration, LLMs, OpenAI-compatible, PR summary, Python, SARIF output, SAST integrations, SAST integrations Keywords: Triplecheck, Triplecheck, VS Code extension, bugs, code review, diff-only review, free API cost, local models, multi-pass voting, patches, severity, static analysis, structured findings, tests, tree-sitter

lm studio

github.com 5 days ago

1181. HN Atombot – A tiny but powerful personal AI assistant

Atombot is a streamlined personal AI assistant designed with efficiency in mind, achieving its core functionalities within about 500 lines of code, making it notably smaller than previous models such as OpenClaw and nanobot. It supports integration with multiple Large Language Model (LLM) providers compatible with OpenAI endpoints and Codex through CLI mode. The bot features a Telegram-based chat access control system, offers persistent long-term memory with searchable logs, and includes capabilities for scheduled reminders and a skills system that aligns with OpenClaw's SKILL.md format. Atombot serves as a versatile personal assistant capable of performing tasks such as web fetching, coding assistance, and schedule management. Users can install Atombot from the source for development purposes or through PyPI for easy usage. Setting up Atombot involves initializing the workspace by detecting providers, configuring optional Telegram integration, and starting interactions either via Telegram or CLI. The project's design efficiently supports these functionalities, facilitating a seamless user experience. Keywords: #phi4, AI, AI assistant, Atombot, CLI, Coding, GitHub, LLM provider, OpenClaw, PyPI, Schedule Manager, Telegram, Web Fetch, configuration, gateway, interactive chat, nanobot, onboarding, persistent memory, reminders, skills, skills system, terminal, terminal Keywords: Atombot, workspace

lm studio

github.com 5 days ago
https://github.com/daegwang/atombot 5 days ago

1246. HN Show HN: Canvo – AI agent with live canvas and Linux sandbox on Android

Canvo is an innovative Android application that transforms mobile devices into powerful AI workstations by integrating an interactive canvas, a real Linux environment, and a plethora of tools for enhanced productivity while on the go. Its standout feature, the AI Agent, transcends traditional chatbots by creating dynamic, live workspaces within conversations. Users can engage with data through the Data Canvas, which supports interactive elements such as dashboards, charts, forms, and quizzes. The inclusion of a Linux Sandbox provides access to over 300 Unix commands, allowing for the installation of programming languages like Python and Node.js, enabling local web app development directly on the device. In terms of tools, Canvo offers unlimited functionalities, building them automatically for tasks such as file management and notifications while supporting persistent scripts and autonomous operations. The application prioritizes privacy with a local-first data storage approach, giving users control over their AI endpoints through Bring Your Own Keys (BYOK) without resorting to cloud sync or telemetry. For installation, users must download an APK and permit installations from unknown sources on Android 13+ devices with arm64-v8a architecture. Canvo's autonomous capabilities include proactive features like scheduled tasks, memory retention, and automated notifications for updates, such as morning briefings. Currently in beta, Canvo invites user feedback to refine its functionalities and allows users to switch between different AI models per session based on task requirements, supporting a variety of providers including Google Gemini, Anthropic Claude, OpenAI GPT, Groq Llama, among others. Keywords: #phi4, AI Agent, AI Workstation, Android, Autonomous Tasks, Beta Development, Data Visualization, Interactive Canvas, Linux Sandbox, OpenAI-Compatible, Persistent Workspace, Privacy First, Unix Commands

lm studio

github.com 6 days ago

1324. HN Show HN: Nemilia – multi-agent AI workspace in a single HTML file, no back end

Nemilia is a cutting-edge AI workspace designed for seamless multi-agent orchestration within a single HTML file, eliminating the need for any backend infrastructure. It empowers users by granting full control over their data, models, and workflows directly on personal devices, emphasizing privacy and user sovereignty. Key features include the ability to create custom agents with distinct roles and personalities using an intuitive drag-and-drop interface, supporting multi-provider AI ecosystems like OpenAI and Anthropic as well as offline capabilities through WebGPU for local model execution. The platform offers advanced functionalities such as document retrieval augmented generation (RAG) with hybrid search methods, human-in-the-loop checkpoints within workflows, and secure data processing entirely on the client side. Nemilia supports a variety of modes including chat, research reports, and visual content creation, while allowing workspace synchronization to local folders for version control. VISION is highlighted as an integral tool for image generation, capable of producing code-based visuals without external keys and supporting AI-generated images from multiple providers. It emphasizes the capability to run models locally in modern browsers using WebGPU after initial setup, with specific VRAM requirements based on model choice. The MCP Tool Execution Tutorial guides users through setting up a workspace folder and initiating an MCP Server for integration within Nemilia. This involves configuring connections to the MCP server, defining agents that use TOOLCALL blocks for file interactions via external tools—all processed client-side. The tutorial also covers workspace management to ensure non-destructive edits and updates. Additional features include customizable prompts, memory systems for workflow history retrieval, and advanced configurations for AI Provider settings, agent creation, and execution flow control. Compatibility notes address browser requirements and keyboard shortcuts, while the changelog provides insights into ongoing enhancements, bug fixes, and system optimizations across Nemilia versions. Keywords: #phi4, AI sovereignty, AI-generated images, API keys, Business Source License, DAG execution, HITL review, HTML file, MCP protocol, Nemilia, VISION, WebGPU, agents, browser inference, browser-native, client-side, code-based visuals, data privacy, document RAG, file system API, human-in-the-loop, hybrid search, image generation, live web research, local models, memory injection, memory system, model overrides, multi-agent AI, no backend, offline mode, orchestrator, predictive execution engine, prompt templates, provider-agnostic, semantic vector search, tool execution, visual content generation, workflow management, workflows, workspace, workspace sync, zero servers

lm studio

github.com 6 days ago

1328. HN Show HN: Neo – AI-powered native .NET desktop app generator

N.E.O. is an innovative AI-powered tool designed to convert natural language prompts into live .NET desktop applications seamlessly. The setup process is straightforward, requiring only the standard .NET runtime while automatically managing additional dependencies like Python when necessary. This tool enables users to develop native Windows applications using WPF or Avalonia frameworks and supports iterative development through plain language commands. It also accommodates hybrid stacks by integrating C#, web technologies, and Python. The technical capabilities of N.E.O. are extensive. It offers SDK-less compilation, automatic dependency management, and self-healing features that address errors and crashes. Users benefit from visual editing options, robust security measures with optional sandboxing, and a branching undo/redo system to enhance productivity. Additionally, the applications can be exported across different platforms and integrated with AI services during runtime. The author contemplates whether N.E.O., originally conceived as a side project, could serve as a valuable open-source initiative. This consideration is particularly pertinent for niche areas where desktop applications surpass web-based solutions in performance, such as enterprise tools or offline applications. Although the code requires further refinement, there's potential to polish it and contribute to the developer community, leveraging its unique capabilities. Keywords: #phi4, AI-powered, C# toolchain, NEO, NET, SDK-less compilation, community project, cross-platform export, desktop app generator, frictionless setup, hybrid stack, native applications, natural language prompts, security sandboxing

lm studio

news.ycombinator.com 6 days ago

1340. HN Show HN: RAGLight, serve a RAG pipeline as a REST API and chat UI in one command

RAGLight is a versatile Python library designed for implementing Retrieval-Augmented Generation (RAG), integrating document retrieval with natural language inference. It supports various large language models and embedding providers, facilitating the creation of context-aware AI solutions. The library features a new `serve` command that launches a FastAPI server with an optional Streamlit chat UI, providing an interactive RAG pipeline accessible via both a REST API and user interface. Key components include modular integration of different LLMs, embeddings, and vector stores, supporting models like HuggingFace's MiniLM for efficient vector embedding. The Agentic RAG Pipeline enhances performance using an Agent to improve results. It also offers MCP Integration, allowing external tool capabilities such as code execution and database access via MCP servers. RAGLight supports flexible document ingestion from diverse formats including PDFs, TXTs, DOCXs, etc., and features an extensible architecture for swapping vector stores, embedding models, or LLMs. The library can be deployed swiftly with a REST API using environment variables for configuration. It includes health checks, question generation, document ingestion (locally or from GitHub), file uploads via multipart/form-data, and listing collections. Additional tools include an Interactive CLI for rapid setup and interaction with documents, and Docker Deployment options with example images provided. A notable feature is the hybrid search option combining BM25 keyword-based retrieval and dense vector similarity search using Reciprocal Rank Fusion (RRF) to enhance accuracy. Installation is straightforward via pip, with extensive documentation available to assist users in configuration and deployment processes. Keywords: #phi4, BM25, Docker, FastAPI, LLMs, MCP Integration, RAGLight, REST API, Reciprocal Rank Fusion, Retrieval-Augmented Generation (RAG), Streamlit, agent pipeline, chat UI, code execution, database access, document retrieval, embeddings, extensible architecture, external tools, hybrid search, language generation, semantic search, vector stores

lm studio

github.com 6 days ago

1407. HN Show HN: Nemilia – multi-agent AI workspace in a single HTML file, no back end

Nemilia is an advanced browser-based tool that allows users to create and manage multi-agent AI systems entirely on the client side without any server dependency. It operates within an HTML file, eliminating the need for backend setups, installations, or account creation. The platform emphasizes AI sovereignty by granting users complete control over their agents, workflows, data, and encryption keys, ensuring privacy from third-party platforms. Key features of Nemilia include custom agent creation with distinct roles and personalities, a drag-and-drop interface for designing workflows that can chain multiple agents in any desired order, and the inclusion of human-in-the-loop review checkpoints. Agents have the capability to execute external tools in real-time via the Model Context Protocol (MCP) and perform document retrieval augmented generation using both semantic and keyword searches processed client-side with vector embeddings and BM25. Nemilia supports a wide range of AI providers such as OpenAI, Anthropic, Groq, Gemini, etc., allowing users to switch seamlessly between them and run models locally through WebGPU for offline capabilities. Security is maintained by encrypting API keys using AES-256-GCM within the browser and ensuring no data leaves the user's machine unless initiated explicitly by the user. The tool offers high portability by syncing workspaces to local folders, facilitating version control and editing. Its architecture ensures all processing is done client-side, enhancing both performance and security. Nemilia provides a comprehensive AI workspace solution prioritizing data sovereignty, cross-platform compatibility, and user flexibility in their AI projects. The accompanying tutorial for Nemilia outlines how to leverage the platform for image generation and local model execution without server connections. It covers generating code-based visuals like charts using Chart.js, SVG diagrams, HTML infographics, and AI-generated images with various providers requiring API key configuration. Local model execution is possible on supported browsers through WebGPU, facilitating direct browser operation of models such as Llama or Mistral. The tutorial also details setting up local workspace folders for file syncing without overwriting existing data and employing prompt templates and a memory system for continuity in tasks across AI sessions. It introduces Model Context Protocol (MCP) execution with external tool operations like file manipulation, using a local MCP server setup through Supergateway. Additionally, it demonstrates constructing multi-agent workflows that enable agents to work sequentially or in parallel on tasks such as web research and report writing. Nemilia includes settings for defaults controlling output tokens, temperature, retries, storage options, live reasoning badges, context safety checks, WebGPU model expansion, and a polished UI enhancing user experience. Licensed under the Business Source License 1.1 (BSL 1.1), Nemilia will transition to an MIT license in February 2030, with commercial usage before then requiring separate licensing agreements. Overall, this tutorial provides a robust framework for utilizing both code-based and AI-generated visuals within Nemilia's ecosystem, alongside local execution of complex models and integration with external tools to boost productivity and workflow automation. Keywords: #phi4, AI provider, AI sovereignty, AI-generated images, API keys encryption, BM25 keyword search, BSL 11 license, DAG pipeline, HITL checkpoints, HTML file, MCP tool execution, Nemilia, WebGPU offline mode, browser inference, browser-native, chat interface, client-side, code-based visuals, custom agents, document RAG, encryption, file system operations, human-in-the-loop review, hybrid Transformersjs embeddings, image generation, image providers, local inference, local models, memory system, multi-CDN fallback, multi-agent AI, no backend, orchestrator, predictive execution engine, prompt templates, provider-agnostic, reasoning model support, semantic search, semantic vector RAG, session memory, visual progress ring, visual workflow design, web search providers, workflow builder, workflows, workspace, workspace sync, zero servers

lm studio

github.com 6 days ago

1427. HN Show HN: Open dataset of real-world LLM performance on Apple Silicon

Anubis OSS is an open-source benchmarking tool developed to evaluate the performance of local AI applications on Apple Silicon devices, such as M1 through M4 chips. It addresses a gap in community-driven data by enabling users to conduct and submit benchmarks across various models using backends like Ollama and LM Studio. The tool leverages native SwiftUI, avoiding external dependencies, to collect hardware telemetry while assessing inference performance. Anubis simplifies the benchmarking process with rapid execution times and one-click result submissions, fostering a comprehensive open dataset that enhances understanding of efficiency and configuration impacts on Apple Silicon. This community-driven dataset offers insights into quantization effects, thermal management, and helps identify suboptimal setups, filling gaps left by synthetic benchmarks or limited reviews. By engaging with Anubis through GitHub stars, users contribute to its broader accessibility via Homebrew Cask distribution, promoting tool development, research, and optimization for Apple Silicon platforms. Keywords: #phi4, Anubis OSS, Apple Silicon, IOReport, LLM performance, Open dataset, OpenAI-compatible backend, SwiftUI app, community resource, hardware telemetry, leaderboard submissions, local AI benchmarking, quantization efficiency

lm studio

devpadapp.com 6 days ago
https://github.com/ggml-org/llama.cpp/discussions& 6 days ago

1520. HN Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)

TerminalNexus is a Windows-based tool developed by Dan to streamline the usage of Command Line Interface (CLI) commands, transforming them into easily accessible buttons within a multi-tab terminal environment. This facilitates users in organizing and executing commands efficiently without having to manually search through notes or command history. The application boasts several advanced features: it allows for scheduling commands with output tracking, generates AI-driven summaries from command outputs, and can produce Git commit messages. Additionally, TerminalNexus provides optional security checks prior to commits and enables conversion between different shell types—Bash, PowerShell, and CMD. Users gain insights into runtime performance and codebase metrics through its interface. TerminalNexus supports integration with both local and cloud-based AI providers, including Ollama, OpenAI, Anthropic, OpenRouter, and LM Studio. It also offers the capability to schedule recurring tasks that are automatically summarized upon completion, enhancing productivity. The tool allows customization for data retention, ensuring that if a local model is used, user data remains on their machine. Currently exclusive to Windows users, TerminalNexus includes a free 14-day trial without requiring any signup process. Additional details and download links can be found at Safesoftwaresolutions.com. Keywords: #phi4, AI, AI summaries, Anthropic, Bash, CLI, CLI commands, CMD, CWE, CWE Top 25, Git, Git commit messages, LM Studio, OWASP, OWASP Top 10, Ollama, OpenAI, OpenRouter, PowerShell, TerminalNexus, Windows terminal, Windows-only, buttons, cloud AI, cloud AI providers, codebase, codebase insights, command scheduling, free trial, free trial Keywords: TerminalNexus, local AI, local AI providers, reusable buttons, runtime, runtime insights, scheduling, scripts, shell, shell conversion

lm studio

news.ycombinator.com 7 days ago

1538. HN New RAGLight feature: deploy a RAG pipeline as a REST API with one command

RAGLight is a versatile Python library designed to enhance Large Language Models (LLMs) through Retrieval-Augmented Generation (RAG), enabling document retrieval capabilities for building advanced, context-aware AI solutions. It emphasizes modularity, allowing users to integrate various LLMs from providers like Ollama, LMStudio, Mistral, OpenAI, and Google, alongside embedding models such as HuggingFace's all-MiniLM-L6-v2. The library includes key features such as an agentic RAG pipeline for improved performance, MCP integration for external tool capabilities (e.g., code execution and database access), flexible support for diverse document types like PDFs and TXT files, and an extensible architecture allowing easy component swaps. RAGLight supports seamless deployment options including a REST API accessible via `raglight serve`, eliminating the need to write Python code and enabling configuration through environment variables. It also provides a command-line interface with tools such as `raglight chat` for interactive document selection and dialogue initiation, alongside Docker-based deployments that facilitate integration with services like Ollama or LMStudio. The library uses environment variables for configuring server settings and provider details while offering features like default ignore folders to streamline document indexing. RAGLight is demonstrated through examples for creating knowledge bases from directories or GitHub repositories, setting up both RAG and agentic RAG pipelines, and enabling hybrid search functionalities that combine BM25 with semantic search techniques. Additionally, it supports custom processors tailored to specific file types such as PDFs containing diagrams. Overall, RAGLight stands out as a robust tool for developing sophisticated AI applications by merging retrieval methods with generative models. Keywords: #phi4, BM25, ChromaDB, Docker Compose, Docker deployment, FastAPI server, FolderSource, GitHubSource, Google Gemini, LLM integration, LMStudio, Large Language Models, Mistral API, Ollama, OpenAI API, Python library, RAGLight, REST API, REST endpoints, RRF, Reciprocal Rank Fusion, Retrieval-Augmented Generation, agent pipeline, code execution, database access, document ingestion, document retrieval, embeddings, environment variables, health check, hybrid search, knowledge base, natural language inference, semantic search, vector store operations, vector stores

lm studio

  github.com 7 days ago
   https://github.com/Bessouat40/RAGLight   7 days ago
   https://raglight.mintlify.app/documentation/rest-api   7 days ago

1615. HN Show HN: Security Audit for Macs Running Local AI (Ollama, OpenClaw, LM Studio)

The "Mac Security Audit" script is a comprehensive tool developed to bolster the security of macOS systems, particularly those configured as AI workstations such as Mac Minis running applications like Ollama and OpenClaw. Its primary function is to identify prevalent misconfigurations and vulnerabilities including unsecured network bindings, weak authentication tokens, exposed Docker ports, and deactivated firewalls. The script operates in three distinct modes: audit-only for assessing security postures without taking corrective actions; a full audit mode that includes firewall assessments; and an auto-fix mode which automatically addresses rectifiable issues. Central to its functionality, the script scrutinizes macOS-specific security settings such as firewall activation status, FileVault encryption integrity, and remote access configurations. It also evaluates AI agent security by examining the status of OpenClaw gateways and the robustness of authentication tokens. Additionally, it audits network services by checking listening ports and exposures via Tailscale, along with server-related configurations like sleep settings. The script is compatible with macOS version 12 or newer and relies on Bash version 3.2+, employing native tools without necessitating external dependencies. Upon execution, the script provides a detailed output delineating the status of each security check conducted, categorizing findings into critical issues, informational notes, warnings, and auto-fixed problems. The project is open for contributions aimed at enhancing its functionality with additional checks or installation methods, distributed under an MIT license. Keywords: #phi4, AI Agents, Auto-fix, Auto-restart, Bash, Critical Issues, Docker, FileVault, Firewall, Gatekeeper, Hardening Script, Homebrew Formula, LM Studio, LaunchAgents, Listening Ports, Local AI Workstations, MIT License, Mac Minis, Network Exposure, Ollawa, OpenClaw, Remote Access, SIP, SSH, Security Audit, Security Checks, Sleep Settings, Software Updates, Tailscale, macOS

lm studio

github.com 7 days ago

1860. HN Building an Inference Engine in 1,800 Lines of C++

The article details the development of "toasted.cpp," a local inference engine written in C++ that significantly enhances processing speed for a 30-billion parameter model, achieving 100 tokens per second on a MacBook—a substantial improvement over previous Python implementations. This advancement was driven by key architectural and design choices, such as using Qwen3-Coder-Next with Mixture-of-Experts (MoE) and Hybrid attention architecture to manage large context sizes efficiently. Optimization techniques played a crucial role, including transitioning from Python to C++ through MLX's API, which improved graph fusion support and addressed issues like type leaks and inefficient GPU operations. Pre-filling strategies were refined by restructuring into chunked batches, enhancing prefill speeds dramatically. Architectural innovations included implementing a session cache that minimized redundant processing in unchanged conversation histories, improving response times by 125x, and compiled step functions to reduce CPU-side graph construction overheads, optimizing token generation speed. Insights from the project highlighted that substantial performance gains typically result from architectural changes rather than micro-optimizations. Large Language Models (LLMs) were found more adept at code generation than optimization due to their reliance on pattern matching over system-specific reasoning. Additionally, the unique unified memory architecture of Apple Silicon necessitated a shift in optimization strategies, moving away from traditional discrete GPU bottlenecks. The distribution strategy for the model involved using rsync for efficient file transfer with features such as resumable downloads and delta transfers. Overall, the project showcases significant performance improvements through innovative architectural changes and offers insights into system understanding versus pattern recognition in AI optimization tasks. Keywords: #phi4, C++, DeltaNet, Inference Engine, MLX, Mixture-of-Experts, Unix socket, compiled step functions, fp16 leak, macOS, optimization, rsync, session cache, speculative decoding

lm studio

linuxtoaster.com 8 days ago

1957. HN 2x Qwen 3.5 on M1 Mac: 9B builds a bot, 0.8B runs it

The article outlines the process of creating a Telegram bot using Qwen 3.5 models on an M1 Mac with limited resources, specifically 16 GB RAM. It involves setting up two main components: OpenCode, which utilizes the larger Qwen3.5-9B-GGUF model for coding tasks, and LM Studio, running the smaller Qwen3.5-0.8B-GGUF model to manage chat interactions. The setup requires installing OpenCode through command line instructions and configuring it alongside a local instance of LM Studio that functions as an OpenAI-compatible server on localhost. The author demonstrates how the Telegram bot forwards messages to this local configuration, retrieves responses, and maintains data privacy by operating offline. Although the hardware constraints result in slower performance, the setup proves beneficial for small teams prioritizing confidentiality in their workflows. The article suggests potential improvements with more advanced Apple Silicon or stronger desktop setups. Essential steps include installing OpenCode, setting up LM Studio with specific models, and developing a Python-based Telegram bot within a virtual environment. This configuration emphasizes local data handling and offline operation, offering an alternative for sensitive tasks on limited hardware without replacing high-end coding stacks. Keywords: #phi4, API endpoint, Apple Silicon, GitHub repository, JSON schema, LM Studio, MacBook M1, Metal llamacpp, OpenAI-compatible endpoints, OpenCode, Qwen35, RAM usage, Telegram bot, coding model, context window, environment variables, hardware performance, inference backend, local server, offline tasks, private workflow, python-telegram-bot, reply model, sensitive data, tokens, venv

lm studio

advanced-stack.com 8 days ago

2128. HN Real-time global intelligence dashboard for news and geopolitical monitoring

World Monitor is an advanced AI-powered dashboard designed for comprehensive global intelligence, news aggregation, and real-time monitoring of geopolitical events, infrastructure developments, and natural disasters. It integrates various curated data sources into a unified interface featuring interactive maps with over 40 customizable data layers such as conflict zones, military activities, and environmental hazards. The platform supports multilingual access to 16 languages and offers AI-synthesized briefs, ensuring users can focus on specific areas like geopolitics or tech by seamlessly switching between different dashboard variants. A standout feature is the interactive 3D globe powered by WebGL technology, which includes smart clustering for enhanced performance. This allows users to visualize complex datasets interactively and in real-time, leveraging AI-driven translation and semantic search capabilities through a Retrieval-Augmented Generation system. World Monitor's commitment to privacy is evidenced by its open-source framework, enabling local deployment on user hardware with secure storage of API keys via OS keychain integration. The platform offers robust data processing features including real-time updates for various intelligence signals like market trends and military movements. It also incorporates live video streaming capabilities ensuring continuous playback across devices. Signal aggregation includes anomaly detection using Welford’s algorithm, providing temporal tracking of global events while supporting social sharing with rich previews via dynamic Open Graph images. Designed to offer a seamless experience, the dashboard is available as both a Progressive Web App and through Tauri for desktop use, facilitating offline functionality and local API handling. Additionally, it integrates multiple advanced intelligence capabilities such as maritime and aviation tracking, prediction market analysis, and security advisories from numerous sources. Infrastructure resilience modeling and GPS interference mapping are key features enhancing its analytical depth. The system’s configuration interface allows users to manage settings like language models and data source credentials without interruption, thanks to independent verification pipelines for each tab. It supports automatic model discovery with fallback options and utilizes a JSON blob in the OS keychain to synchronize changes across UIs efficiently. Debugging is facilitated through verbose mode logs and accessible DevTools. Updates are managed via an auto-update checker, ensuring users have access to the latest features without service interruption, while smart caching strategies optimize performance, particularly for offline map browsing. The dashboard's design incorporates mobile optimization, allowing drag-and-drop reordering and intelligent alert popups to enhance user interaction. For strategic intelligence and forecasting, World Monitor employs a tiered AI summarization approach using both local and cloud-based models optimized for network conditions, ensuring efficient processing and result caching. It provides detailed country dossiers with instability indices and predictive analytics. The system also features sophisticated threat classification and hotspot escalation scoring to dynamically assess geopolitical risks. Furthermore, the platform integrates real-time data from various sources, including military intelligence, cyber threat feeds, and natural disaster monitoring using Open-Meteo ERA5 datasets for climate anomaly detection. This integration allows comprehensive risk assessment by combining insights into strategic theater postures, undersea cable health, and infrastructure dependencies. In essence, World Monitor offers a holistic solution for global monitoring and analysis, leveraging cutting-edge technology to deliver actionable intelligence through a user-friendly interface that supports diverse analytical needs and operational contexts. Keywords: #phi4, ACLED, AI Summarization, AI forecasting, AI-powered aggregation, AIS Detection, API Keys, CORS, Cache Purge, Circuit Breakers, Climate Anomaly Detection, Climate Panel, Command Palette, Country Export, Country Instability Index, Cyber Threat Intelligence, Data Freshness, Deduction Panel, Download API, EONET, ERA5 reanalysis, Edge Functions, Feature Toggles, Forecasting, GDACS, GDELT, GPS Interference, GPS/GNSS Interference, GeoJSON, Geopolitical analysis, Groq LLM, HMR, Haversine-deduplication, Headline Memory, Historical Playback, Humanitarian Data, IOCs, Infrastructure Cascade Modeling, Intelligence Dossier, ML Worker, Map Overlay, Map State, Military Surge Detection, Mobile Optimization, Natural Disaster Monitoring, OREF Alert, Oil Analytics, Open-Meteo, OpenAI-compatible endpoint, Population Estimation, Protest Tracking, Protocol Buffers, RPC, Real-time intelligence, Redis Deduplication, Redis caching, Regression Testing, Service Monitoring, Stock Indices, Strategic Risk Score, TV Mode, Telegram Feed, Telegram OSINT Feed, Travel Advisory, Trending Keywords, UCDP Conflict, Undersea Cable Monitoring, Universal Coverage, Vercel, configuration UI, geolocation, geopolitical monitoring, infrastructure tracking, live video streams, market analysis, multilingual support, news context, news feeds, rate-limiting, scatter dots, semantic search, signal aggregator, threat classification

lm studio

github.com 9 days ago

2211. HN Show HN: ClawShield – Open-source security proxy for AI agents (Go, eBPF)

ClawShield is an open-source security proxy crafted to safeguard AI agents, utilizing Go and eBPF technologies. Positioned as a defensive layer in front of the OpenClaw AI gateway, its primary function is to scrutinize all incoming and outgoing communications through various scanning mechanisms—prompt injection detection, secrets/PII identification, vulnerability assessment, and malware recognition. This comprehensive system operates under a deny-by-default policy framework, allowing customization via YAML configuration files for tool allowlists/denylists, domain restrictions, and specific agent/channel rules, with all decisions meticulously logged in SQLite for auditing purposes. Enhancing security further, ClawShield incorporates optional features like an iptables egress firewall to regulate network traffic and an eBPF kernel monitor that detects abnormal system behaviors such as fork bombs or privilege escalations. Its user-friendly setup process involves Docker commands, supporting installation through pre-built binaries or direct source compilation. The architecture of ClawShield is grounded in a defense-in-depth strategy across three distinct layers: application-level message analysis with policy enforcement, network-layer egress management, and kernel-level syscall monitoring for detecting behavioral anomalies. As a production-ready tool, it can be deployed with additional security protocols such as TLS termination via Nginx. Moreover, ClawShield integrates five specialized AI agents equipped with RAG (Retrieval-Augmented Generation) knowledge bases, providing robust protection against threats like prompt injections and data leaks. ClawShield is open for community contributions on GitHub under the Apache 2.0 license and builds upon the OpenClaw framework, adapting traditional network security models to suit AI environments. This makes it a versatile and comprehensive solution for fortifying AI agent ecosystems. Keywords: #phi4, AI Agents, Audit Logging, Behavioral Anomaly Detection, Canary Token, ClawShield, Defense-in-Depth, Docker, Firewall, Go, HTTP WebSocket, Malware Detection, Network Security, Open-source, PII Scanning, Policy Engine, Real-time Alerts, Reverse Proxy, Secrets Detection, Security Proxy, Syscall Monitoring, TLS, Vulnerability Scanning, eBPF

lm studio

github.com 9 days ago

2289. HN Show HN: Epstein-Search – Local, AI-Powered Search Engine for the Epstein Files

Epstein-Search is an open-source, AI-powered local search engine tailored for semantic searching of the Epstein Files, which comprise publicly accessible court documents, FBI reports, flight logs, and similar materials. Built in Python, it offers both command-line interface (CLI) functionalities and library features to conduct searches or operate Retrieval-Augmented Generation (RAG) models without the need for cloud services or API keys, ensuring privacy. The engine utilizes a local vector database called zvec, which stores pre-computed document embeddings for swift indexing and rapid querying. Users can execute standard searches locally using sentence-transformers to process query embedding and similarity searching against this indexed data. In addition to traditional search capabilities, Epstein-Search introduces a conversational RAG mode via LiteLLM, supporting both local models like Ollama and external cloud providers such as Anthropic, OpenAI, or Gemini. The setup process is streamlined into three steps: installing the tool, configuring the database, and initiating an interactive chat interface. This involves downloading approximately 100K document chunks with pre-computed embeddings, allowing users to begin immediately. The search functionality can be refined by filtering results based on specific document types like court filings or flight logs, and it enables displaying both raw source context and generated answers. The project encourages support through cryptocurrency donations, which are detailed in its GitHub repository. Importantly, the dataset is sourced from public domain materials, adhering to open access standards. Keywords: #phi4, AI-Powered, Cloud LLMs, DOJ, Epstein Files, Epstein-Search, FBI Reports, Flight Logs, Interactive Mode, LM Studio, Legal PDFs, LiteLLM, MIT License, Ollama, Open Source, Public Domain, Python CLI, RAG, Semantic Search, Sentence-Transformers, Vector Database, zvec

lm studio

github.com 10 days ago

2529. HN Unsloth Dynamic 2.0 GGUFs

Unsloth Dynamic 2.0 introduces an advanced quantization technique that excels over previous methods by achieving superior 5-shot MMLU accuracy and reducing KL Divergence. This innovative approach facilitates fine-tuning across various inference engines such as llama.cpp and Ollama, while maintaining high levels of model accuracy. The method features a sophisticated layer selection process tailored to optimize quantization schemes for both MoE and non-MoE architectures, leveraging a carefully curated calibration dataset to enhance conversational chat capabilities without the overfitting issues prevalent in earlier datasets. Benchmarking highlights significant improvements with models like Aider Polyglot and Gemma 3, which often surpass full-precision counterparts using this method. Dynamic v2.0 efficiently manages quantization across layers, minimizing disk space usage while preserving accuracy. Tests on MMLU scores underscore the necessity of precise implementation to avoid performance drops. Additionally, the success is evidenced by bug fixes in Llama 4 models that substantially increased their MMLU Pro accuracy. Overall, Unsloth Dynamic 2.0 marks a substantial advancement in quantization technology, offering enhanced efficiency and improved model performance with reduced resource demands. Keywords: #phi4, Dynamic, GGUFs, Gemma 3, KL Divergence, Llama 4, MMLU, QAT, accuracy, benchmarks, bug fixes, calibration dataset, disk space, efficiency, inference engine, layer selection, model-specific quants, overfitting, performance, perplexity, quantization, v20

lm studio

  unsloth.ai 11 days ago
   https://unsloth.ai/docs/models/qwen3.5/gguf-b   11 days ago
   https://huggingface.co/blog/moe   11 days ago
   https://bknyaz.github.io/blog/2026/moe/   11 days ago
   https://github.com/qskousen/ggufy   11 days ago
   https://huggingface.co/unsloth/Qwen3.5-35B-A3B-Experime   11 days ago
   https://unsloth.ai/docs/models/qwen3.5/gguf-b   11 days ago
   https://unsloth.ai/docs/models/qwen3.5/gguf-b   11 days ago
   https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF   11 days ago
   https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF&#x   10 days ago

2581. HN A tool to launch your OpenClaw in just 1 minute

OpenClaw Hosting provides a managed cloud platform specifically tailored for the seamless deployment of OpenClaw, an open-source autonomous AI agent. It simplifies this process through its one-click deployment feature, eliminating the need for technical know-how or server management by users. The platform is compatible with any OpenAI-compatible model endpoint and offers complimentary access to sophisticated models like Kimi K2.5 from Moonshot AI. A key feature of OpenClaw Hosting is its commitment to privacy; each instance runs in a secure, isolated Docker container that ensures user data remains private and can be exported at any time. Moreover, OpenClaw’s integration capabilities extend across various messaging platforms including Telegram, WhatsApp, Discord, Slack, Signal, and iMessage, enabling continuous 24/7 operation across channels. This setup allows users to engage with the AI agent in diverse environments efficiently and securely. Keywords: #phi4, Anthropic Claude, Discord, Docker setup, Google Gemini, Kimi K25, LM Studio, Moonshot AI, Ollama, OpenAI-compatible model, OpenClaw, SSL, Signal, Slack, Telegram, VPS, WhatsApp, auto-updates, autonomous AI agent, iMessage, infrastructure, isolated container, managed cloud platform, multi-channel agent, one-click deployment

lm studio

clawhost.chat 11 days ago

2929. HN LM Link: Use local models on remote devices, powered by Tailscale

LM Link is a collaborative effort between LM Studio and Tailscale that provides a secure platform for sharing open-weight Large Language Models (LLMs) across devices without requiring public internet access. It allows users to operate LLMs on their private hardware while ensuring privacy and security through encrypted connections. The service simplifies the process of connecting remote devices, enabling model sharing with straightforward authentication procedures. LM Link caters to various applications such as accessing high-capacity models from home, providing team-wide access to substantial models, secure internal testing environments, enhancing edge computing capabilities, and supporting private industry-specific uses. The platform utilizes tsnet for handling secure traffic without altering the kernel, ensuring privacy in device communications. LM Studio aids in discovering and configuring devices, while Tailscale's private network streamlines technical setup processes. While free for personal use, enterprise solutions are also offered by LM Link. Keywords: #phi4, GPU-backed models, Go program, LM Link, LM Studio, Tailscale, desktop app, device discovery, encrypted connections, end-to-end encryption, enterprise plans, keep-alives, local models, open-weight LLMs, private networks, remote devices, terminal commands, tsnet

lm studio

tailscale.com 13 days ago

2980. HN LMStudio LM Link: Use your local models, remotely

LMStudio has developed the LM Link feature to facilitate secure, end-to-end encrypted connections between devices using custom Tailscale mesh VPNs. This functionality allows users to operate local models on remote devices seamlessly as if they were running locally, enhancing privacy and efficiency in operations. Importantly, this setup restricts chat interactions to individual devices, ensuring conversations remain private and confined to specific hardware. For discovery of compatible devices, only device lists are shared, with all other data kept strictly local, never uploaded to LM Studio's servers. This design underscores the commitment to user privacy and security by preventing unnecessary data transfer and maintaining complete control over personal information within a secure network environment. Keywords: #phi4, LM Link, LM Studio, Tailscale mesh VPNs, backend servers, chats local, custom VPNs, device connection, device discovery, encryption, end-to-end encrypted, local models, model loading, remote devices

lm studio

lmstudio.ai 13 days ago

3033. HN LM Link: Use local models on remote devices, powered by Tailscale

LM Link, developed in collaboration with LM Studio, provides a secure method for sharing large language models (LLMs) between devices owned by users without the risk of exposure on public networks. This is achieved through Tailscale's encrypted connections, facilitating easy and safe model distribution across personal and professional devices globally. The system simplifies access to remote LLMs via laptops or GPU-powered servers and ensures secure end-to-end encryption irrespective of network location. Setting up LM Link with LM Studio allows users to seamlessly connect local and remote models through desktop applications or terminal commands. The platform offers broad applications, such as enabling home users to utilize powerful LLMs remotely, providing research teams access to advanced models, and enhancing edge devices' computational abilities. It supports industries prioritizing data privacy by keeping model use on-premises and allows developers to securely test large models without exposing their infrastructure. LM Link is built with tsnet in Go, ensuring secure, auditable device connections without the need for altering kernel configurations. Importantly, all communication between connected devices remains private as neither Tailscale nor LM Studio's backend service can access transmitted data. While LM Studio offers free personal use, enterprise plans are available for additional features. Users interested in starting with LM Link can visit their website to create an account and begin using the service. Keywords: #phi4, GPU-backed models, Go program, LM Link, LM Studio, Tailscale, desktop app, device discovery, encrypted connections, end-to-end encryption, enterprise plans, keep-alives, local models, open-weight LLMs, private networks, remote devices, terminal commands, tsnet

lm studio

tailscale.com 13 days ago

3069. HN I built an open-source AI Gateway that sits between your apps and LLM providers

The AI Gateway is a versatile, open-source, and self-hosted API gateway designed to facilitate communication between applications and various Large Language Model (LLM) providers. It offers an OpenAI-compatible API that efficiently routes requests to supported backend services such as Google Gemini, OpenAI, Anthropic, Mistral, Perplexity, xAI, Cohere, Azure OpenAI, Ollama, or LM Studio. This platform emphasizes individualized client management by providing unique API keys, assigning specific backends, and enforcing rate limits along with token quotas while allowing optional system prompts. A key feature is its built-in admin dashboard which provides real-time usage monitoring and comprehensive client management capabilities. The AI Gateway supports all supported providers through streaming using Server-Sent Events (SSE), offering flexible configuration options for clients to select different models and backend services as per their needs. Initial setup involves downloading or building the gateway, followed by running it to create a configuration file and obtain admin credentials, with further adjustments made via an administrative interface. The project is meticulously organized into modules that manage HTTP requests, middleware functions, database models, backend providers, and implement robust security measures. These measures include hashed client API keys, signed cookies, security headers, and restrictions on request sizes to ensure secure operations. Additionally, per-client features such as model whitelists, rate limits, and token quotas enhance its functionality. Developed under the MIT license, the AI Gateway offers a scalable solution for managing LLM interactions across diverse applications and services. Keywords: #phi4, AI Gateway, API, Docker, LLM, LLM providers, OpenAI-compatible, WebSocket, admin dashboard, architecture, rate limits, security headers, security headers Keywords: AI Gateway, self-hosted, streaming, token quotas

lm studio

github.com 13 days ago

3103. HN Swival – A coding agent for open models

Swival is an innovative coding agent tailored for open models, capable of integrating seamlessly with tools such as LM Studio and the Hugging Face Inference API to autonomously complete tasks using an independent tool loop. The system stands out for its simplicity in setup, as it automatically detects any model loaded within LM Studio without needing manual configuration. This streamlined process is underpinned by a compact implementation involving only a few thousand lines of Python code. Notably, Swival's design eschews reliance on external frameworks, emphasizing ease of use and efficiency. Overall, Swival provides an intuitive solution for enhancing the functionality of open models through autonomous task completion, leveraging its straightforward setup and robust integration capabilities. Keywords: #phi4, HuggingFace Inference API, LM Studio, Python, Swival, autonomous tool loop, coding agent, completion, framework, model discovery, open models, task, zero setup

lm studio

swival.github.io 13 days ago

3131. HN Show HN: A CLI to query the unsealed court files with local LLMs

"epstein-search" is a free, open-source command-line interface tool designed for querying unsealed court files associated with Jeffrey Epstein, utilizing local Large Language Models (LLMs). It caters to audiences who prefer technical solutions and aligns well with Show HN post preferences. The tool efficiently processes thousands of poorly scanned PDF documents into searchable segments using a Retrieval-Augmented Generation (RAG) pipeline. Users can conduct semantic searches locally without the need for API keys, employing tools such as OpenAI or Anthropic for speed, or Ollama and Llama.cpp for privacy considerations. The tool offers straightforward setup with three commands: `pip install epstein-search`, `epstein-search setup`, and `epstein-search chat`. It features an interactive mode where users can toggle between search-only and RAG modes, switch LLM models, modify result settings, and review current configurations. Support for both cloud-based and local LLMs allows flexibility based on user privacy needs. Released under the MIT license, "epstein-search" is accessible to anyone interested in leveraging its capabilities. It includes a dataset of over 100,000 pre-computed document embeddings sourced from publicly available documents like those from the U.S. Department of Justice and FBI reports. The project offers support through crypto tips for users who wish to contribute financially. Keywords: #phi4, CLI, Hacker News, LLMs, PDF parsing, RAG pipeline, document chunks, embeddings, interactive mode, local model, open-source, privacy-first, semantic search, vector database

lm studio

github.com 14 days ago

3158. HN LM Studio: LM Link

LM Link is a feature in LM Studio designed to facilitate secure, end-to-end encrypted communication between devices running either LM Studio or llmster through custom Tailscale mesh VPNs. This functionality enables users to seamlessly access models on remote devices as if they were local, while ensuring chat privacy by limiting shared data to device lists for discovery purposes only. Importantly, no information is transmitted to backend servers, thereby enhancing user security and privacy in model interactions across devices. Keywords: #phi4, LM Link, LM Studio, Tailscale mesh VPNs, backend servers, connection, custom VPNs, device discovery, devices, end-to-end encrypted, local chats, models, remote devices

lm studio

lmstudio.ai 14 days ago

ScraperSpider

Scraper
Spider