Scraper
Spider

About
Blog
@dbaman@fosstodon.org

Click ▶ to show/hide AI summary and keywords
Click The google logo

for Google search on keywords

2026-03-11 15:27

rag

rag stories from the last 14 days | Back to all stories

128. HN Axllm: DSPy for TypeScript

Axllm is a framework designed to streamline the development process for applications utilizing Large Language Models (LLMs) by leveraging TypeScript. Addressing prevalent challenges such as intricate prompt engineering and infrastructure management, Axllm enables developers to define task-specific inputs and outputs easily. The framework simplifies prompt creation, integrates error handling, retries, and provides observability features, ensuring a robust development experience. The key features of Axllm enhance its utility and flexibility. It offers type-safe integration with TypeScript, including auto-completion, which boosts developer efficiency. Its provider-agnostic design allows seamless operation across various LLM providers like OpenAI, Anthropic, and Google without necessitating code rewrites when switching between them. For production environments, Axllm ensures readiness through built-in validation mechanisms, support for streaming responses, and observability via OpenTelemetry tracing. Furthermore, Axllm supports complex workflows that involve multi-modal data processing (including images, audio, and text) and intricate pipelines. It facilitates recursive long-context analysis through its AxAgent and RLM components. The framework also incorporates optimization tools like MiPRO, ACE, and GEPA for automatic prompt tuning, enhancing the performance of LLM applications. Despite its extensive capabilities, Axllm maintains a lightweight structure with minimal dependencies to ensure reliability and speed in application development. Community support is accessible through platforms such as Twitter, Discord, and GitHub. Axllm's effectiveness is underscored by its proven track record in real-world scenarios, making it an ideal choice for building AI applications efficiently. Keywords: #phi4, ACE, AI Apps, AWS Bedrock, Agents, Ax, AxFlow, Bun, Complex Pipelines, DSPy, Deno, Framework, Function Tools, GEPA, LLMs, Long-context Analysis, MiPRO, Multi-hop Retrieval, Multi-modal, Nodejs, Observability, OpenTelemetry, Optimization, Persistent Sessions, Production Ready, Quality Loops, RAG, Sandbox Permissions, Streaming, Type-safe, TypeScript, Vercel SDK, Web Worker

rag

axllm.dev 12 hours ago

178. HN Weaviate on current state of RAG for enterprises

The e-book delves into the application of Retrieval-Augmented Generation (RAG) within enterprises, emphasizing the design of scalable architectures for autonomous RAG agents that are both grounded and efficient. It focuses on practical implementation strategies in production environments using tools such as StackAI and Weaviate. The primary aim is to offer comprehensive insights into effectively leveraging these technologies at scale, facilitating businesses in harnessing their full potential while ensuring robustness and scalability. By providing detailed guidance on architecture design and tool application, the e-book serves as a crucial resource for enterprises seeking to integrate advanced RAG solutions into their operations. Keywords: #phi4, RAG, StackAI, Weaviate, agents, architectures, autonomous, build, design, e-book, enterprises, grounded, production, scale

rag

www.stackai.com 22 hours ago

236. HN Show HN: Krira Augment – Production Ready RAG in Minutes

Krira Labs, under its founder and CEO, has introduced Krira Augment to streamline the transition of Retrieval-Augmented Generation (RAG) systems from prototypes to production-ready solutions. While tools like LangChain facilitate initial RAG development, scaling them involves complex engineering tasks such as infrastructure setup, monitoring, scalability adjustments, pipeline creation, and ongoing maintenance. To alleviate these challenges, Krira Augment offers an AI infrastructure designed to assist developers in creating reliable production systems for RAGs, AI agents, MCP servers, and related workflows. The early prototype of this tool is currently open for feedback from the Hacker News community, with a demonstration available on YouTube. Interested individuals can join a waitlist via the Krira Labs website to stay informed about future updates. Keywords: #phi4, AI, Krira Augment, Krira Labs, RAG, bootstrapping, demo, feedback, infrastructure, maintenance, monitoring, pipelines, production-ready, prototype, scaling, waitlist

rag

www.kriralabs.com a day ago

411. HN Show HN: VectorLens – See why your RAG hallucinates, no config

VectorLens is a diagnostic tool designed specifically to tackle the challenge of identifying "hallucinations" or errors within Retrieval-Augmented Generation (RAG) pipelines. By streamlining the debugging process, it eliminates the need for manual code instrumentation and the complexities associated with cloud-based observability tools, such as signing up for services or entering into enterprise agreements. The tool is characterized by its ease of integration, requiring only three lines of Python code to set up, and operates without any configuration changes needed in the existing user's codebase. A standout feature of VectorLens is its ability to function entirely on a local machine, ensuring data privacy and security as it avoids uploading sensitive information or utilizing API keys. It effectively detects hallucinations by comparing the outputs from language models with their corresponding retrieved context using sentence-transformers. Furthermore, VectorLens offers perturbation attribution, which helps users pinpoint specific data chunks that influence model output changes by evaluating responses when these data segments are altered. The tool supports a range of both open-source and commercial language models, like Ollama/Mistral and GPT-4, ensuring broad compatibility across different platforms. Another significant advantage is its non-blocking operation; it runs diagnostics in the background to maintain optimal application performance without interruption. Developed by Gustav-Proxi and available on GitHub, VectorLens invites community feedback for future enhancements while addressing key issues such as privacy concerns and vendor lock-in, ultimately facilitating more efficient local debugging of RAG pipelines. Keywords: #phi4, GitHub, Python, RAG, VectorLens, hallucination detection, local, monkey-patching, no vendor lock-in, observability tools, perturbation attribution, privacy, sentence-transformers, speed

rag

news.ycombinator.com a day ago

429. HN Ragflow: fuses RAG with Agent capabilities to create context layer for LLMs

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine designed to enhance Large Language Models (LLMs) by integrating agent capabilities for improved context layers. This streamlined RAG workflow supports businesses of all sizes, leveraging a unified context engine and pre-built templates to efficiently convert complex data into sophisticated AI systems. Key features include advanced data understanding through deep document analysis, template-based intelligent text chunking, grounded citations with visualized text chunking for human verification, and compatibility with diverse data formats like Word documents, PDFs, images, and web pages. RAGFlow ensures a seamless RAG workflow with configurable models and user-friendly APIs. The system architecture of RAGFlow is deployable via Docker, requiring minimal hardware resources such as 4 CPU cores, 16 GB RAM, and 50 GB disk space, supporting both CPU and GPU operations. Configuration involves files like `.env`, `service_conf.yaml.template`, and `docker-compose.yml`. Users can switch between document engines from Elasticsearch to Infinity, though the latter lacks full support on Linux/arm64 machines. RAGFlow fosters open-source development with comprehensive contribution guidelines, enabling users to deploy services for testing using Docker Compose alongside tools like uv and pre-commit. The platform has been updated to include new models such as OpenAI's GPT-5 series and improved data synchronization capabilities. Users are encouraged to engage with the community by starring its repository to access ongoing enhancements. Community engagement is a cornerstone of RAGFlow, promoting collaboration and innovation in AI development through various channels, thereby enriching the ecosystem surrounding this open-source tool. Keywords: #phi4, Docker, Elasticsearch, GPT-5 models, HuggingFace, Infinity, LLMs, MinIO, MySQL, RAG engine, RAGFlow, Redis, Retrieval-Augmented Generation, agent capabilities, backend service, community collaboration, context layer, data synchronization, document parsing, frontend service, ingestion pipeline, jemalloc, open-source

rag

github.com 2 days ago

477. HN Show HN: Local AI stack (Docker, Ollama) that lets you build apps without Python

The described project introduces a local-first AI stack leveraging Docker and Ollama to enable developers to create large language model (LLM) tools and workflows without requiring proficiency in Python. It features multimodal chat capabilities, Retrieval Augmented Generation with automatic document import, support for various MCP tools (including web search, file access, Office 365), and the ability to create custom tools using JSONata & SQL. The stack aims to offer the flexibility of custom Python code while remaining accessible through an open web user interface. The key components are: - **Dashjoin Platform**: A low-code platform that allows developers to integrate LLMs into workflows or custom UIs, set up programmatic chat hooks, and implement fine-grained role-based access control. - **Ollama Integration**: Facilitates the local installation and retrieval of AI models for various tasks. - **MCP Tool Support**: Enables tool utilization via MCP-proxy configuration, supporting functionalities like web search. To set up this system, users need to clone a GitHub repository to obtain necessary files and configurations. They must configure settings such as using an Ollama instance or external AI services with API keys. Docker commands are used to manage containerized components including Dashjoin, AIA backend, MCP-proxy, and Postgres database. Persistent data is maintained across sessions via volumes. The project emphasizes ease of setup through simple clicks and low-code configurations while providing robust capabilities for developing AI applications. The software is distributed under the PolyForm Free Trial License 1.0.0 with enterprise licensing options available. Keywords: #phi4, API key, Containers, Dashjoin, Docker, Embedding model, Enterprise license, External AI service, JSONata, LLM tools, Local AI, Low code platform, MCP tool support, Multimodal chat, Ollama, Postgres, Programmable AI, RAG, Retrieval Augmented Generation, SQL, Volumes

rag

github.com 2 days ago

560. HN Show HN: Portable RAG (Open Source)

"Raglet" is an open-source Python library designed for efficiently managing and searching large text data sets that exceed typical context window sizes but don't require full-scale vector databases, targeting applications like codebases, note folders, or Slack exports. It enables the creation of searchable indices from .txt and .md files using local embeddings through sentence-transformers, without needing API keys. Key features include fast indexing with `RAGlet.from_files()` and quick search operations, alongside the ability to save and load indices in a directory format compatible with version control systems like Git for easy portability. Performance benchmarks demonstrate rapid build times—3.5 seconds for 1 MB and about 6 minutes for 100 MB—with efficient search durations ranging from milliseconds to over ten milliseconds depending on data size. Current limitations include support only for .txt and .md files, though future plans aim to extend functionality to PDFs and DOCX formats; the library also lacks file change detection and is most practical for datasets up to approximately 100 MB due to increasing build times with larger volumes. The developer encourages user feedback on potential workflow enhancements using this tool. Keywords: #phi4, API Design, Benchmark, Build Time, Codebase, Context Window, Data Storage, File Formats, Git Commit, Limitations, Local Embeddings, Notes, Open Source, Portable RAG, Python Library, RAGlet, Search Speed, Sentence-Transformers, Slack Export, Text Search, Tokenization, Vector Database, Workflow Integration

rag

news.ycombinator.com 2 days ago

622. HN Show HN: Raglet(open-source)–portable RAG for small text corpora (no infra)

Raglet is an open-source tool designed for creating searchable directories from small text corpora without needing servers or API keys. It excels in managing medium-sized datasets like codebases or Slack exports that are too large for simple prompts yet too small to necessitate dedicated vector databases. Raglet offers straightforward installation via pip or Docker and operates by generating a semantic search index from files. Users can build an index using `RAGlet.from_files`, perform searches, and save the directory in various formats such as `.raglet/` (default), SQLite for incremental updates, and zip for read-only access. It efficiently handles datasets up to 100 MB with search times under 11 ms, and its build time scales linearly based on size. The tool currently supports only .txt and .md files, while larger datasets require external vector databases. Additionally, it does not support real-time file change detection. Looking ahead, Raglet plans to extend functionality by adding support for PDF, DOCX, HTML formats; implementing semantic chunking and metadata filtering; introducing project-level ignores; providing JSON output for queries; and enabling lighter installations with ONNX runtime. Raglet is built on principles of portability, small-scale efficiency, retrieval-only capability, open formats without proprietary restrictions, and minimal infrastructure needs. Its architecture is modular, comprising core components focused on domain models, document processing, embedding generation, vector storage, file serialization, and configuration systems. This design ensures Raglet's utility in various contexts where lightweight and efficient text search solutions are required. Keywords: #phi4, API keys, CLI, Docker, FAISS, JSON, RAG, Raglet, SQLite, configuration, embeddings, incremental updates, infrastructure, limitations, memory, open-source, portable, retrieval, roadmap, search, semantic, sentence-aware chunking, text corpora, vector database, workspace-scale, zip archive

rag

github.com 2 days ago

699. HN Far: File-Augmented Retrieval, Now Support Mac Vision Framework

FAR (File-Augmented Retrieval) is a tool developed to enhance AI coding agents' ability to interpret binary files by generating persistent Markdown-based `.meta` sidecar files, which provide structured input from various formats like PDFs, Word documents, and videos. Unlike Retrieval Augmented Generation (RAG), which operates at query time, FAR augments files in advance for future use, effectively addressing the limitations faced by AI tools such as Claude Code and GitHub Copilot with non-textual content. On macOS, it uses Apple Vision and Spotlight metadata to enhance processing capabilities while employing intelligent caching based on file timestamps or content hashing to expedite builds. Additionally, FAR creates directory summaries through `.dir.meta` files, enabling comprehensive understanding of directories without individually scanning each file. Privacy is maintained via a `.farignore` feature akin to `.gitignore`, ensuring sensitive data remains unprocessed unless permitted. Unlike RAG that may lose context due to token fragmentation, FAR maintains the structure and completeness of original content by drawing inspiration from Unity Engine's asset sidecar system, thus eliminating reliance on cloud services or complex runtime pipelines. The tool is designed for seamless integration with existing systems, supports offline functionality unless configured otherwise, and can leverage the OpenAI API key for added features like vision transcription. Being open-source under an MIT License, FAR offers a flexible and privacy-conscious solution to augmenting file-based data retrieval and comprehension for AI agents. Keywords: #phi4, AI coding agents, Apple Vision, FAR, File-Augmented Retrieval, Mac Vision Framework, Markdown, OCR, RAG, Unity Engine, binary files, caching, directory summaries, ecosystem compatibility, env configuration, file layer infrastructure, intelligent caching, macOS enhancements, meta sidecar, metadata extraction, persistent text sidecar, privacy security, selective extraction, selective extraction Comma-separated List: FAR, selective extraction Extracted Keywords: File-Augmented Retrieval, selective extraction Final Answer: FAR, selective extraction Final Comma-separated List: FAR, selective extraction Final Keywords: FAR, selective extraction Final List: FAR, selective extraction Keywords: File-Augmented Retrieval, selective extraction Selected Keywords: FAR, selective extraction Simple Keywords: FAR, selective extraction Simplified Keywords: FAR

rag

github.com 3 days ago

1022. HN Show HN: Graph-Oriented Generation – Beating RAG for Codebases by 89%

The article introduces Graph-Oriented Generation (GOG), a novel deterministic graph engine that significantly enhances understanding of codebases by 89% compared to traditional Retrieval-Augmented Generation (RAG) methods. GOG achieves this improvement by transferring reasoning tasks from Large Language Models (LLMs) to its network graph-based approach, which reduces token usage and allows smaller models to accurately trace complex enterprise execution paths. Utilizing the `networkx` library, GOG isolates relevant code files for processing. The article presents a reproducible benchmark comparing GOG with RAG in terms of context load and execution time. To execute this benchmark, users must install dependencies via Python’s package manager and OpenCode CLI through NPM, offering both cloud-based setups using cutting-edge models and local runs with smaller language models like `qwen` to avoid API latency and costs. The results aim to demonstrate GOG's efficiency across different environments by handling extensive codebases with fewer computational resources. Furthermore, the author seeks endorsement for their white paper on arXiv under the cs.IR and cs.AI categories. Keywords: #phi4, API latency, Benchmark Harness, Graph-Oriented Generation, LLMs, Ollama, OpenCode CLI, Python Engine, RAG, SRM Engine, Small Language Model, Symbolic Reasoning Model, benchmark, cloud models, csAI, csIR, dependency graph, deterministic graph engine, dummy files, execution pathsKeywords: Graph-Oriented Generation, local resources, networkx, reasoning, token usage

rag

github.com 4 days ago

1068. HN RAG is broken, lets fix it

Embedding drift in Retrieval-Augmented Generation (RAG) systems arises from changes over time in how text generates vectors, influenced by model updates, preprocessing alterations, or re-embedding practices. This shift results in degraded retrieval quality without obvious errors and can be detected through methods such as monitoring cosine distances on known documents and observing the stability of nearest neighbors. Various factors cause drift, including partial re-embedding, adjustments to preprocessing pipelines, shifts between model versions, changes at chunk boundaries, and infrastructure or index modifications, all of which subtly alter vector geometry and compromise retrieval performance. To identify embedding drift, teams should consistently compare cosine distances for sample texts, evaluate the overlap of nearest neighbors over time, ensure consistent counts of vectors, and monitor any distributional shifts in L2 norms. Prevention strategies focus on maintaining stability by pinning components such as model versions and preprocessing steps to prevent unintended changes. When addressing drift after it occurs, using version-controlled embeddings facilitates quick rollbacks, allows for detailed comparison between different versions, and helps identify external modifications. Regular audits of these elements are crucial for sustaining reliable retrieval quality, emphasizing the importance of disciplined management over complexity in the embedding pipeline. Keywords: #phi4, Embedding drift, RAG pipeline, benchmark queries, cosine distance, infrastructure changes, model updates, nearest-neighbor stability, partial re-embedding, preprocessing changes, retrieval quality, vector count divergence, vector count divergence Keywords: embedding drift, vector space, versioning

rag

decompressed.io 5 days ago

1079. HN Show HN: RapidFire AI – parallel RAG experimentation with live run intervention

RapidFire AI revolutionizes the experimentation process within Retrieval-Augmented Generation (RAG) pipelines by enabling parallel configuration testing, thus overcoming the limitations of traditional sequential approaches that are time-consuming and resource-intensive. The tool's key features include shard-based interleaved scheduling, which facilitates concurrent execution of multiple configurations, allowing immediate performance comparisons without waiting for individual completion. This is complemented by Interactive Control Operations (IC Ops), providing users with dynamic control to stop, resume, clone, or modify experiments in real time based on observations. Furthermore, RapidFire AI offers automatic system optimization that efficiently manages resources such as GPU utilization and API token expenditure, ensuring optimized performance without extra overhead. Integration with MLflow enhances experiment tracking and metrics visualization, supporting effective management of experimentation data. The architecture is built around a microservices model consisting of components like the dispatcher, database (SQLite), controller, workers, and dashboard, promoting efficient resource management and an improved user experience during AI experiments. RapidFire AI accommodates various RAG pipeline configurations, including chunking strategies, embedding models, retrieval methods, reranking thresholds, prompt templates, and generation model swaps, with a unique feature of live-updating evaluation metrics for real-time experiment adjustments. To begin using RapidFire AI, users need to set up their environment with Python 3.12.x and install necessary dependencies, accessible through its GitHub repository alongside detailed documentation covering usage, setup, and troubleshooting. Additionally, the tool supports customization via environment variables for tailored configurations. As a community-driven project, it encourages collaboration and contributions under established governance guidelines, aiming to enhance its capabilities further. Keywords: #phi4, AutoML support, GPU utilization, Interactive Control Ops, Jupyter notebook, MLflow integration, RAG pipelines, RapidFire AI, SQLite database, live intervention, microservices architecture, parallel experimentation, shard-based scheduling

rag

github.com 5 days ago

1242. HN Show HN: Argmin AI, system level LLM cost optimization for agents and RAG

Argmin AI presents a system-level cost optimization solution specifically designed for large language models (LLMs), addressing critical areas such as efficiency in prompt generation, context management, model selection, retrieval-augmented generation (RAG) inefficiencies, and agent workflows. This platform was developed to tackle the unpredictable costs and latency issues often encountered during LLM production use. It provides tailored optimization strategies that have been validated through comprehensive evaluations and quality control measures. Prior to implementation, Argmin AI conducts a structured assessment of an organization's pipeline to pinpoint specific cost drivers, enabling teams to concentrate their efforts on meaningful optimizations. The company actively seeks feedback from users in production environments regarding challenges like cost attribution, safe routing, and evaluation coverage. To facilitate potential optimization evaluations, they offer a quick 3-minute cost calculator tool. Additionally, Argmin AI shares insights through a case study that details effective LLM optimization strategies. Due to concerns about document overuse, detailed information is accessible only after email registration, ensuring interested parties can benefit from the full range of resources provided by the platform. Keywords: #phi4, Argmin AI, LLM optimization, RAG, agents, assessment, caching, case study, context efficiency, cost attribution, cost efficiency, decision framework, evals, feedback, guardrails, metrics, model selection, privacy policy, production challenges, prompt efficiency, rollout steps, routing, safe routing, savings estimation, system level, workflows

rag

argminai.com 6 days ago

1409. HN The Modern Search Engine: The Complete Pipeline – How It Ranks Results

The article provides an overview of the intricate processes within modern search engines like Google, Bing, and Yandex that determine how they rank results and adapt based on user interactions. It outlines a comprehensive pipeline starting with crawling and canonicalization, where crawlers respect site directives and utilize algorithms to normalize URLs for efficient indexing. Indexing itself involves creating searchable structures such as inverted indexes (e.g., BM25) and vector embeddings, alongside link graphs and metadata, leveraging hybrid retrieval methods that combine sparse and dense techniques. Query understanding is enhanced through deep-learning models that interpret user intent, recognize entities, correct errors, and apply contextual filters based on language or location. The document retrieval process involves both keyword-based and semantic similarity approaches to ensure relevance in search results. A multi-stage ranking cascade further refines these results using sophisticated models like gradient-boosted trees and transformer re-rankers, ensuring the final search engine result page (SERP) is relevant, diverse, and safe. This SERP integrates various content types, including AI-generated answers grounded by retrieval-augmented generation to minimize inaccuracies. Feedback mechanisms involving user interactions and human evaluations drive continuous improvement of these systems. Metrics like NDCG and Precision/Recall are used for offline quality assessments, while models undergo controlled online testing before full deployment. Comparative insights highlight Google's focus on comprehensive ranking systems, mobile-first indexing, and AI-driven ads; Bing’s emphasis on whole-page relevance with generative answers through its Copilot interface; and Yandex’s use of regional signals to provide localized results. Overall, modern search engines are advanced ecosystems integrating information retrieval, machine learning, neural ranking, and generative AI, constantly evolving through user feedback and technological advancements. Keywords: #phi4, AI Models, BERT, BM25, Crawlers, Feedback Loop, Generative AI, Hybrid Retrieval, Indexing, Neural Search, Query Processing, RAG, Ranking Cascade, Search Engine

rag

blog.ivan.digital 6 days ago

1455. HN Show HN: RustyRAG lowest-latency open-source RAG on GitHub

RustyRAG is an open-source, low-latency Retrieval-Augmented Generation (RAG) API developed in Rust by Ignas Vaitukaitis. It boasts impressive response times—under 200ms on localhost and under 600ms from Azure North Central US to a browser in Brazil without using GPUs. The system incorporates significant advancements such as utilizing Cerebras/Groq for LLM inference, adopting Jina AI's v5-text-nano-retrieval model for embeddings, and enhancing search accuracy with LLM-generated chunk prefixes for contextual retrieval. Designed as an asynchronous Rust binary, it efficiently handles the RAG pipeline processes including document ingestion, semantic chunking, vector search, and streaming of LLM responses. The API supports PDFs and leverages Milvus for vector storage while providing an interactive Swagger UI for endpoint documentation. Key technical features include low-latency inference using Groq and Cerebras hardware, efficient embeddings from Jina AI that offer a strong performance-to-cost ratio, and advanced semantic chunking with contextual retrieval. The deployment is streamlined through Rust's Actix-Web framework and Docker Compose, facilitating local infrastructure setup including Milvus vector database and Jina embeddings. RustyRAG allows easy customization via a `.env` file for API keys, models, and other configurations. Its architecture supports real-time streaming, concurrent document ingestion, and interactive UI testing through an SSE-powered chat frontend. Licensed under MIT, RustyRAG presents a comprehensive solution for low-latency RAG applications without the complexity of multiple microservices, making it suitable for performance-critical environments. Keywords: #phi4, API keys, Actix-Web, Cerebras, Cerebras wafer-scale engine, Docker Compose, Groq, Groq LPU, HNSW, HuggingFace TEI, Jina AI, Jina TEI, LLM inference, LLM providers, MTEB benchmark, Milvus, OpenAI-compatible, PDF ingestion, RAG API, Rust, RustyRAG, SSE streaming, async binary, async web server, asynchronous, chat UI, chat completions, contextual retrieval, cosine similarity, document ingestion, embeddings, latency, local embeddings, low-latency, low-latency inference, open-source, semantic chunking, vector DB, vector search

rag

github.com 6 days ago

1471. HN Is RAG Dead?: Building a smarter chatbot

"Is RAG Dead?: Building a Smarter Chatbot," authored by Todd Kerpelman and Zach Keller, examines the development and evolution of Bill, an AI chatbot created by Plaid. Initially developed during a 2023 hackathon to aid developers with documentation, Bill was expected to be supplanted by commercial products within a year but has since expanded into support roles due to its effectiveness. The article highlights challenges Bill faced when dealing with complex API reference documents, which traditional RAG (retrieval-augmented generation) models struggled to handle effectively because they often lost essential context during embedding. To enhance performance, several strategies were explored: providing additional context did little to close contextual gaps; breaking down API properties into smaller chunks improved relevance but still faced challenges against larger prose documents when using single retrieval methods. A successful approach involved feeding entire endpoint documentation to the AI model, utilizing advancements in handling large context windows and filtering irrelevant data. This holistic method significantly boosted accuracy for reference document queries. However, this success came with drawbacks such as increased latency from multiple database interactions and LLM communications, alongside higher costs per query due to larger data inputs. These challenges were partially addressed by prompt caching strategies, which helped reduce expenses. The article concludes that while traditional RAG models face limitations with complex documents, advancements in AI have enabled more effective handling of large datasets. This shift suggests a move away from conventional RAG methodologies toward advanced language model techniques, leading to the notion that "RAG is dead." Keywords: #phi4, AI models, API Reference, Bill, LLM, Plaid, RAG, chatbot, context, cost, documentation, embedding vectors, endpoints, hackathon, integration health, latency, prompts, reference docs, relational database, reranker, retrieval-augmented generation, support flow, vector database

rag

plaid.com 6 days ago

1623. HN Show HN: sombra – Your personal deep analysis system for understanding power

"SOMBRAS" is an AI system developed to assist consultants and managers in analyzing complex scenarios by identifying crucial agents, their interests, and predicted actions. This tool facilitates decision-making through iterative refinement of analyses via search functions and adversarial challenges using a Retrieval-Augmented Generation (RAG) knowledge base. Users can input topics or articles into the system to receive tailored recommendations on how best to leverage the identified situations. Initial tests have yielded positive feedback from users, highlighting its effectiveness in scenario analysis. The creators encourage feedback to further enhance the tool's capabilities and address user needs effectively. Keywords: #phi4, AI system, RAG, RAG knowledge base, actors, adversarial, agents, analysis, benefits, benefits Keywords: AI system, chat, consultants, decisions, field, interests, managers, multi-agent, news article, power, recommendations, tool calling

rag

sombra.consulting 7 days ago

1675. HN New Python library by Guido van Rossum

The "typeagent" is an experimental Python library developed by Guido van Rossum designed to translate TypeAgent KnowPro and related packages from TypeScript into Python. This project is currently focused on creating a Minimum Viable Product (MVP) for structured Retrieval-Augmented Generation (RAG). The library facilitates interaction with third-party Large Language Models (LLMs), cautioning users against indexing confidential information due to potential security risks. Additionally, the documentation advises adherence to Microsoft's trademark guidelines and warns against implying unauthorized sponsorship or misusing third-party trademarks, ensuring that legal boundaries are respected in its usage and dissemination. Keywords: #phi4, Guido van Rossum, LLM, Microsoft, Python, RAG, TypeAgent, TypeScript, brands, code, documentation, guidelines, logos, policies, project, prototype, sponsorship, trademarks, translation

rag

github.com 7 days ago
https://x.com/gvanrossum/status/202902103121905276 7 days ago

1850. HN Show HN: Open-sourced AI Agent runtime (YAML-first)

AgentRuntime is an enterprise-level platform crafted for the deployment of autonomous AI agents in production settings with a focus on safety and reliability. It distinguishes itself from traditional chatbots by providing comprehensive infrastructure management, covering aspects such as policies, memory management, workflows, observability, cost tracking, and governance. The configuration of agents and their governing policies is facilitated through YAML files, following an "infrastructure-as-code" methodology. Key features include a policy engine powered by Common Expression Language (CEL), risk scoring in various categories, secure encrypted audit logs, role-based access control (RBAC) with multi-tenancy support, and workflow orchestration via a visual designer. The platform supports observability through tools like OpenTelemetry for distributed tracing and Prometheus metrics, alongside mechanisms for cost attribution. Designed to be scalable and production-ready, AgentRuntime offers Kubernetes-native deployments with auto-scaling features and secure communication integration with service meshes such as Istio or Linkerd. It enhances agent capabilities by incorporating memory systems, context assembly, and Retrieval Augmented Generation (RAG) to anchor responses in a knowledge base. Developers benefit from CLI tools, SDKs, and a visual workflow designer, while operators can utilize Helm charts, Kubernetes custom resources, and auto-scaling configurations for deployment. Built using Go, the platform ensures reliability through extensive testing and coverage. AgentRuntime supports diverse use cases like data pipelines, code review automation, content generation, customer support, research, and DevOps tasks. It is open-source under the MIT License, leveraging other open-source projects such as OpenTelemetry for observability and React Flow for workflow design. Despite its capabilities, current limitations include simulated delegation in workflow execution and the need to run specific tools prior to deploying Kubernetes operators. Future enhancements aim to bolster visual workflows, cost tracking, security measures, and multi-region deployments. Users seeking support or additional information can refer to GitHub issues and documentation on the project's repository. Keywords: #phi4, AI agents, API integration, AgentRuntime, CEL expressions, Go programming language, Helm charts, Kubernetes, Kubernetes operator, OpenTelemetry, Prometheus metrics, RAG, RBAC, YAML-first, audit logs, deterministic replay, governance, infrastructure-as-code, multi-tenancy, observability, plugin development, policy engine, security, semantic search, tool framework, visual workflow designer, workflow orchestration

rag

github.com 8 days ago

1918. HN Show HN: Private AI Document Server

The authors have released the code for a Private AI Document Server as an open-source project after discontinuing their service, enabling users to upload up to 100,000 documents and interact with an AI agent offline while maintaining complete privacy on any server. This tool supports extensive data types, including large spreadsheets or CSV files, and goes beyond simple Retrieval-Augmented Generation by offering multi-step processing akin to a research assistant's capabilities. The developers invite user feedback and provide contact details via email for further discussions. Keywords: #phi4, AI Agent, CSV Sheets, Document Server, Feedback, Install Server, Multi-step Processing, Offline, Open Source, Privacy, Private AI, RAG, Research Assistant, Upload Docs

rag

github.com 8 days ago
https://news.ycombinator.com/item?id=47226834 8 days ago

2040. HN Show HN: Benchmarking the Keep memory system with LoCoMo

The "Keep" memory system is designed to refine the capabilities of AI agents by leveraging repeated reflection on actions, which enhances their skills over time. Central to this approach is the implementation of working memory that facilitates iterative improvement. The evaluation of Keep's performance utilizes benchmarking tools, specifically referencing results from the LoCoMo benchmark. This assessment revealed an overall score of 76.2%, with task-specific scores highlighting varying complexities: single-hop questions achieved 86.2% (841 questions), temporal questions scored 68.5% (321 questions), multi-hop questions at 64.2% (282 questions), and open-domain questions reached 50.0% (96 questions). Keep employs local models for embedding generation and analysis, while utilizing gpt-4o-mini to handle queries and judgment tasks, demonstrating that a local-only large language model (LLM)-assisted memory system can meet significant benchmarks. The system's goal is to offer "lightweight agentic memory" by managing not only conversations but also URLs, documents, and artifacts, similar to systems like RAG. It addresses retrieval challenges from context-rich conversation data through embedding techniques, full-text search (FTS), and structured traversal methods. Further exploration of Keep's capabilities involves chat-based benchmarks that focus on core storage and retrieval functions, showcasing the practical applications of iterative querying, or "agentic RAG," for information extraction purposes. Future development plans include enhancing inference depth and adopting performance measures beyond accuracy metrics. Overall, Keep provides a robust foundation for effective memory management in AI agents through local processing, with potential for comprehensive enhancements moving forward. Keywords: #phi4, AI agents, Keep, LoCoMo, RAG, analysis, benchmarks, conversations, deep retrieval, embeddings, gpt-4o-mini, lightweight agentic memory, local models, memory system, retrieval

rag

keepnotes.ai 9 days ago

2154. HN Show HN: Ragtoolina – MCP tool that adds codebase RAG to AI coding agents

Ragtoolina is an advanced Machine-Code Processing (MCP) tool designed to optimize AI coding agents by pre-indexing codebases for efficient context provision, eliminating the need for individual file scanning. Benchmark tests on Cal.com's codebase demonstrated its efficiency with a 63% reduction in tokens and 43% fewer tool calls compared to traditional methods. Although it provided no benefits for simple queries, Ragtoolina significantly reduced token usage by up to 79% during complex tasks involving multiple files, resulting in notable cost savings. Quality assessments through blind AI-judge scoring showed that Ragtoolina matched or exceeded baseline performance in four out of five tasks evaluated. The tool is compatible with any MCP-compatible client and offers a free tier. Additionally, it promotes a "60 DAYS OF PRO" offer available without the need for a credit card. Keywords: #phi4, AI coding agents, Calcom, Claude Code, Claude Desktop, Cursor, GitHub stars, MCP, MCP tool, Ragtoolina, Windsurf, benchmarked, blind scoring, codebase RAG, completeness, complexity levels, complexity levels Final Comma-separated List: Ragtoolina, conciseness, correctness, cost savings, free tier Comma-separated Keywords: Ragtoolina, free tier Extracted Keywords: Ragtoolina, free tier Final Comma-separated List (No Duplicates): Ragtoolina, free tier Final Comma-separated List: Ragtoolina, free tier Final Keywords (12 or Fewer): Ragtoolina, free tier Final Keywords (No Duplicates): Ragtoolina, free tier Final Keywords: Ragtoolina, free tier Final List: Ragtoolina, free tier Keywords: Ragtoolina, free tier Simplified Keywords: Ragtoolina, pre-indexes, quality evaluation, specificity, token reduction, tool calls

rag

www.ragtoolina.com 9 days ago

2172. HN Show HN: Deterministic symbolic memory layer for grounding LLMs

The project introduces a deterministic symbolic memory layer designed to enhance the grounding of Large Language Models (LLMs) by addressing their reliance on probabilistic recall. This innovative approach overcomes limitations inherent in current AI systems, such as RAG, embeddings, and prompt-based memory methods that often fail at enforcing invariants or maintaining factual accuracy. By utilizing deterministic identity lookups, the proposed method retrieves knowledge just-in-time from a symbolic layer, thereby integrating explicit symbols into AI workflows through a protocol interface known as MCP (Memory Content Protocol). SymbolicMemoryMCP serves as a Proof-of-Concept implementation demonstrating this capability. This deterministic memory solution provides a controllable and reliable knowledge backbone that complements existing probabilistic methods by clearly delineating the boundaries between reasoning processes and factual truth. Implemented as an architectural pattern, it transcends specific technology stacks to offer reproducibility, auditability, and well-defined knowledge boundaries. Consequently, this approach lays out a minimal technical realization of the Just-In-Time (JIT) Symbolic Memory design pattern, fostering opportunities for experimentation and further discussion in AI development contexts. Keywords: #phi4, AI Systems, Architectural Pattern, Auditability, Deterministic, Embeddings, Graph Databases, Ground Truth, Identity Lookup, Invariants, JIT Symbolic Memory, Knowledge Backbone, LLMs, MCP, Probabilistic Recall, Proof-of-Concept, Protocol Interface, RAG, Relational Databases, Symbolic Memory, Vector Memory

rag

github.com 9 days ago

2173. HN Show HN: Synthesize complex agent training data with just a few lines of code

AgentFlow is an innovative unified framework designed for synthesizing high-quality agent training data across diverse environments, supporting applications such as RAG (Retrieval-Augmented Generation), MM-Doc, Deep Research, GUI interactions, Text2SQL, Data Analysis, and Embodied Agents. It simplifies the generation of complex training data through a user-friendly abstraction layer, allowing users to accomplish tasks with minimal code. The framework includes an extensible sandbox environment that supports multiple agent environments out-of-the-box. Key features of AgentFlow encompass its focus on synthesizing agent data and model training across domains seamlessly, coupled with innovative benchmarks aimed at challenging existing models and highlighting overlooked real-world issues. Its data synthesis process is structured into a three-stage pipeline: Trajectory Sampling, Trajectory Selection, and QA Synthesis, utilizing large language models (LLMs) to ensure high-quality content generation. The framework also streamlines the processes of model training, deployment, and inference with straightforward configuration steps. Supported by extensive research papers, an array of models, and datasets, AgentFlow enhances agent capabilities further. It provides comprehensive performance evaluations across various benchmarks, demonstrating its potential in advancing agent technologies. As an open-source project under the Apache 2.0 license, AgentFlow encourages global developer contributions. Community support is accessible via WeChat, facilitating collaboration and assistance. Researchers are urged to cite relevant papers when utilizing AgentFlow to acknowledge its contributions to their work. Keywords: #phi4, AgentFlow, Apache 20, Data Analysis, Deep Research, DocDancer, Embodied Agents, GUI, LLM-driven, MM-Doc, NL2SQL, QA synthesis, RAG, RAGShaper, Text2SQL, WebAgent, agent training, benchmarks, configuration, data synthesis, document-grounded, information seeking, model consolidation, multimodal questions, open-source community, sandbox environment, trajectory sampling

rag

github.com 9 days ago

2333. HN Show HN: RAG-Enterprise – 100% local RAG system for enterprise documents

RAG Enterprise is a comprehensive Retrieval-Augmented Generation (RAG) system designed for enterprises requiring stringent data privacy and control over their documents, ensuring all operations remain local without external data transfers. The platform supports automated setup in under an hour with fast internet connectivity and can handle over 10,000 documents across 29 languages using modern Large Language Models like Qwen3 and Mistral 7B. Its architecture guarantees 100% local processing to protect sensitive information, utilizing a React + Vite frontend, FastAPI backend for handling user interactions and document management, and the Qdrant vector database with Ollama LLM server for processing. The system emphasizes robust security measures through JWT-based authentication with role-based access control (RBAC) and offers comprehensive backup and restore capabilities via rclone, supporting over 70 cloud providers. It distinguishes three user roles—User, Super User, and Admin—with varying permissions to manage documents and users efficiently. To deploy RAG Enterprise, one needs Ubuntu 20.04 or higher, an NVIDIA GPU with at least 8GB VRAM, a minimum of 16GB RAM, and 50GB of storage space. RAG Enterprise is particularly suited for industries like law, healthcare, finance, and government that necessitate rigorous data handling standards due to its privacy-centric design and compliance with the AGPL-3.0 license, which mandates sharing modifications when used as a service. Additionally, it encourages community involvement through clear contribution guidelines, making it an adaptable solution for organizations prioritizing secure document management and processing. Keywords: #phi4, AGPL-30 license, AGPL-30 license Keywords: RAG-Enterprise, Docker Compose, JWT authentication, NVIDIA GPU, RAG-Enterprise, React frontend, automated installation, backup restore, cloud providers, data privacy, local RAG, local RAG system, multilingual support, vector database

rag

github.com 10 days ago
https://github.com/I3K-IT/RAG-Enterprise 10 days ago

2477. HN Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Krira Chunker is a high-performance document chunking library built with Rust, specifically designed to enhance Retrieval-Augmented Generation (RAG) pipelines by significantly improving speed and memory efficiency compared to existing tools like LangChain. It boasts a 40x increase in processing speeds due to its native Rust implementation and maintains O(1) space complexity, ensuring consistent memory usage regardless of document size. This library is easily integrated into any RAG pipeline through a drop-in Python API and has achieved production-ready status with multiple versions released and substantial installations. Installation of Krira Chunker is straightforward via pip (`pip install krira-augment`), and it offers an intuitive API that allows users to configure chunk sizes, splitting strategies, and options for cleaning HTML or Unicode content. It supports both local processing using tools like Sentence Transformers and ChromaDB, as well as cloud integration with major providers such as OpenAI, Pinecone, Qdrant, Weaviate, and Cohere. The library includes a streaming mode that allows real-time data processing without saving to disk, optimizing efficiency for dynamic pipelines. It is compatible with various document formats like CSV, JSONL, PDFs, Word documents, and Excel files, providing suitable extraction methods for each type. Krira Chunker incorporates robust error handling mechanisms to gracefully manage API rate limits and exceptions, ensuring stable production deployments. Users can choose between streaming or file-based processing based on their specific needs, such as prioritizing speed versus the ability to re-process or share chunks. The library's compatibility with various embedding vector stores, including both free and paid options, enhances its versatility for diverse development requirements. Keywords: #phi4, Krira Chunker, LangChain, O(1) memory, Python bindings, RAG pipelines, Rust-powered, architecture, document chunker, error handling, installation, performance benchmark, production-ready, provider comparison, provider comparison Keywords: Rust-powered, streaming mode, supported formats

rag

github.com 11 days ago

2484. HN Seeking Advice on Improving OCR for Watermarked PDFs in My RAG Pipeline

The developer is addressing challenges related to enhancing OCR performance in a RAG pipeline when processing watermarked PDFs. The current method involves using PyMuPDF for text extraction; however, the central watermark generates noise and artifacts that negatively affect OCR accuracy. The issue raises questions about whether these difficulties are due to limitations within PyMuPDF or if alternative solutions might be more effective. Operating under the constraint of an RTX 4000 GPU with 8GB VRAM, the developer seeks recommendations for robust OCR libraries or models specifically tailored to handle watermarked documents. Additionally, they are interested in preprocessing techniques that can effectively reduce watermark interference and improve the overall extraction process. The developer invites community collaboration on their open-source project hosted on GitHub, encouraging contributions and engagement that could elevate its visibility through stars and active participation from the developer community. Keywords: #phi4, GPU constraints, GitHub repository, OCR, PDFs, PyMuPDF, RAG pipeline, RTX 4000, artifacts, chunking, extraction, noise, open-source, preprocessing strategies, retrieval accuracy, watermark suppression

rag

news.ycombinator.com 11 days ago
https://pg.llmwhisperer.unstract.com/ 10 days ago

2717. HN RAGScore – Evaluate RAG pipelines in 2 commands, works offline with Ollama

RAGScore is an efficient tool designed to evaluate Retrieval-Augmented Generation (RAG) pipelines offline using Ollama, supporting both local and cloud environments with various large language models (LLMs). It offers a streamlined process for generating QA datasets and assessing RAG systems through just two commands. Key features include its privacy-first approach by enabling evaluations on local LLMs to ensure data privacy, fast performance delivering quick results like accuracy scores and incorrect QA pairs, and multilingual support in languages such as English, Chinese, Japanese, and German. The tool is user-friendly with easy installation via pip, and offers both a Python API for notebook integration and a CLI for production environments. It provides detailed evaluations using multiple metrics to assess the correctness, completeness, relevance, conciseness, and faithfulness of responses. Users can generate QA pairs from documents with `ragscore generate <path>` and evaluate RAG systems against these questions with `ragscore evaluate <endpoint>`. For local evaluations, models like llama3.1 and qwen2.5 are recommended based on resource availability, with a minimum suggested model size of 8B for quality assurance. It ensures compliance with GDPR, HIPAA, and SOC 2 standards when using local LLMs. As an open-source project hosted on GitHub, RAGScore encourages community contributions and feedback, offering a comprehensive solution for evaluating RAG systems that emphasizes privacy, speed, and ease of use. Keywords: #phi4, AI agents, CLI, GDPR compliance, JSON format, Ollama, Python API, QA datasets, RAG pipelines, RAGScore, evaluation, local LLMs, multilingual, privacy-first

rag

github.com 12 days ago

2756. HN FAR: Make Every File Readable to AI Coding Agents with Persistent .meta Sidecars

FAR (File-Augmented Retrieval) is an innovative tool developed to enhance AI coding agents' ability to interpret binary files by generating persistent .meta sidecars containing extracted content in Markdown format. Unlike traditional RAG systems that retrieve information at query time and risk losing document structure, FAR pre-augments files with structured metadata, enabling instant offline access to complete file contexts for AI applications. FAR supports a wide range of formats such as PDFs, Word documents, Excel spreadsheets, images, videos, audio files, Jupyter notebooks, emails, archives, and databases. It employs various extractors including Tesseract for Optical Character Recognition (OCR) and GPT-4V for image captions to facilitate this conversion. The tool implements an intelligent caching system with a two-layer cache that ensures only modified file content is re-extracted, thereby improving processing speed significantly. Additionally, FAR automatically creates .dir.meta files which offer summaries of entire directories, further enhancing its utility in data organization and accessibility. Privacy and security are prioritized through offline functionality and customizable path exclusions via a .farignore file, allowing users to selectively extract or exclude sensitive information. The tool is designed to integrate seamlessly with existing AI ecosystems by providing clean and structured input at the file layer without necessitating additional infrastructure. Inspired by Unity Engine's approach to managing game assets, FAR applies similar principles to enhance AI coding agents' interpretation of non-code data. This solution was detailed in a paper titled "File-Augmented Retrieval: Making Every File Readable to Coding Agents via Persistent .meta Sidecars" by Kelly Peilin Chan (2026). Released under the MIT License, FAR is accompanied by comprehensive documentation, making it accessible for users aiming to leverage its capabilities in various applications. Keywords: #phi4, AI Coding Agents, API Keys, Binary Files, Directory Summaries, Ecosystem Compatibility, Extracted Content, FAR, FFprobe, File-Augmented Retrieval, Incremental Builds, Intelligent Caching, Local Tools, MIME Type, Markdown, Metadata Extraction, OCR, Offline Support, Persistent Text Sidecar, Privacy & Security, RAG, SHA-256, Selective Extraction, Selective ExtractionKeywords: FAR, Tesseract, Unity Engine, farignore, meta Sidecars

rag

github.com 12 days ago

2790. HN Pplx-Embed: Embedding Models for Web-Scale Retrieval

Perplexity has introduced two advanced text embedding models, pplx-embed-v1 and pplx-embed-context-v1, designed for efficient web-scale retrieval in both low-latency and high-quality contexts with 0.6 billion and 4 billion parameters respectively. These models utilize diffusion-based training to convert causal language models into bidirectional encoders, enhancing their ability to consider full context during retrieval tasks. This capability is further refined through a multi-stage contrastive learning process that begins by aligning queries and documents and progresses towards refining document boundaries using hard negatives. Trained on an extensive multilingual dataset of 250 billion tokens, these models have shown superior performance across various benchmarks including MTEB, BERGEN, ToolRet, ConTEB, as well as internal tests like PPLXQuery2Query and PPLXQuery2Doc. A significant innovation in pplx-embed models is the implementation of native quantization-aware training. This allows embeddings to be stored in INT8 or binary formats, drastically reducing storage requirements while maintaining performance levels compared to traditional FP32 formats. Such efficiency facilitates web-scale deployment by making embedding storage and retrieval more feasible. In evaluations, these models outperformed existing solutions across both contextual and non-contextual benchmarks, excelling particularly in real-world scenarios with long-tail queries and noisy data distributions. They achieved leading metrics like nDCG@10 and recall rates at large depths, which are crucial for first-stage retrieval systems within multi-stage ranking pipelines. These models are made available on Hugging Face under an MIT license and support inference across various frameworks, providing a versatile tool for developers in the field. For those seeking deeper technical insights, a detailed technical report by Perplexity is accessible to guide users further in leveraging these advanced text embedding technologies. Keywords: #phi4, Hugging Face API, INT8, PPLXQuery2Doc, RAG, benchmarks, binary quantization, contrastive learning, dense text, diffusion-based pretraining, embedding models, multilingual, pplx-embed, retrieval, web-scale

rag

research.perplexity.ai 12 days ago
https://emschwartz.me/binary-vector-embeddings-are-so-cool&# 12 days ago

3027. HN Show HN: Mneme–Persistent memory for AI agents without vector search or RAG

Mneme is a command-line interface (CLI) tool designed for managing persistent memory in AI coding agents, enabling them to retain information across sessions without depending on vector search or Retrieval-Augmented Generation (RAG). It addresses the issue of session-based context loss by implementing a three-layered memory architecture: the Ledger Layer stores long-term facts such as engineering decisions and architectural constraints, requiring human approval for changes; the Beads Layer manages mid-term task-related information to ensure continuity between sessions; and the OpenCode Layer holds short-term execution contexts like current code analysis or file edits, existing only within a session. Mneme's structure facilitates clarity and persistence by organizing information into these distinct layers and integrates with tools like Dolt and bd for effective task and fact management. It offers an easy initialization process to set up the necessary project directories, including folders for facts (.ledger), tasks (.beads), session prompts (.opencode), and behavior rules (AGENTS.md). Mneme provides various commands that support launching agents, managing tasks, and handling facts through a proposal and review system, which maintains human oversight over long-term decisions. Additionally, it supports an autonomous mode allowing minimal human intervention while offering feedback control capabilities. While Mneme enhances AI coding agents like OpenCode by managing their memory across sessions without the need for additional infrastructure, it is not an AI model or RAG system itself. Its focus remains on task tracking and fact management with a minimalist approach, emphasizing streamlined workflows while preserving essential human oversight over critical decisions. Keywords: #phi4, AI agents, CLI, OpenCode, RAG, agent behavior, architecture decisions, autonomous mode, beads, coding agents, context compaticion, dependency-aware tracker, execution context, fact proposals, facts management, ledger, long-term decisions, mneme, persistent memory, project structure, session startup, task state, task tracking, vector search

rag

github.com 13 days ago

3058. HN Show HN: Director-AI – token-level NLI+RAG

Director-AI is a middleware tool developed to enhance the reliability of language model outputs by mitigating hallucinations, functioning as an intermediary between users and Large Language Models (LLMs). It assesses each token generated for coherence using two primary methods: contradiction detection via DeBERTa-v3-based Natural Language Inference (NLI) and fact-checking through Retrieval-Augmented Generation (RAG), utilizing a custom knowledge base stored in ChromaDB. The tool features include real-time Token-Level Streaming Halt, which stops generation if coherence falls below a certain threshold, ensuring high-quality output. Director-AI is technically versatile, integrating seamlessly with OpenAI-like endpoints and tools such as LangChain or LlamaIndex. It allows users to ingest specific data sources into their custom knowledge bases for tailored fact-checking. The tool uses a scoring mechanism that combines contradiction probability (H_logical) from NLI and factual deviation (H_factual) from RAG, requiring scores above 0.6 for output approval. The system architecture comprises components like the Coherence Agent, Safety Kernel, and Ground Truth Store, with installation options ranging from heuristic scoring to full setups using NLI models. Benchmarked on LLM-AggreFact data, Director-AI demonstrates a balanced accuracy of 66.2%, showcasing its real-time streaming capabilities and customization advantages over similar tools lacking these features. Director-AI is available under dual licensing: open-source use under AGPL v3 or commercial deployment with proprietary licenses for closed-source applications and SaaS models. It offers various pricing tiers to accommodate different organizational needs, emphasizing the prevention of inaccuracies in LLM outputs through real-time assessments and custom knowledge bases. Continuous feedback on aspects such as scoring weights and kernel design is encouraged to refine its functionalities further. Keywords: #phi4, AGPL, ChromaDB, DeBERTa-v3, Director-AI, LLM, LangChain, LlamaIndex, NLI, OpenAI-compatible, RAG, benchmarks, coherence, commercial license, contradiction detection, factual deviation, fine-tuning, grounding truth store, hallucination, hallucination guardrail, integration, knowledge base, real-time, safety kernel, scoring, streaming kernel, token-level

rag

  github.com 13 days ago
   https://github.com/anulum/director-ai#benchmarks   13 days ago
   https://huggingface.co/spaces/anulum/director-ai-g   12 days ago
   https://github.com/anulum/director-ai   12 days ago
   https://github.com/anulum/director-ai/releases   12 days ago

3076. HN RAG on a Budget: How I Replaced a $360/Month OpenSearch Cluster for $1.12/Month

In early 2026, a comprehensive overhaul was undertaken for a personal website to reintroduce an AI knowledge agent previously shelved in 2025. Initially planned to be enterprise-grade using OpenSearch for vector storage and OpenAI models, the high costs ($360/month) necessitated a pivot to a more budget-friendly infrastructure. The new solution eliminated the vector database entirely, opting instead to store document embeddings precomputed with Amazon Bedrock's Titan Text Embeddings V2 in an S3 bucket. These embeddings were loaded into memory on AWS Lambda at startup for efficient cosine similarity searches, thereby removing ongoing OpenSearch expenses. The system incorporated tiered LLM routing using Amazon Bedrock models; however, challenges accessing Anthropic models via Bedrock led to a reliance on the stable and cost-effective Llama 3.3 70B model. To manage rate limiting and prevent abuse, a DynamoDB table was employed, while API requests were authenticated with simple API keys. The knowledge base, drawn from existing content like resumes and blog posts, underwent AI-assisted summarization, manual curation, and deduplication to ensure coherence and relevance. The architecture's significant achievements include reducing costs dramatically from approximately $730/month to just $1.12/month while maintaining operational feasibility at a small scale. Although the system is not suited for more than 10K document chunks or real-time updates due to Lambda cold starts, it effectively manages around 200 chunks with robust retrieval quality. This project highlights the importance of tailoring infrastructure design to actual usage needs rather than defaulting to industry-standard practices when unnecessary. Keywords: #phi4, AI agent, API Gateway, AWS, Amazon Bedrock, DynamoDB, LLM generation, Lambda, Nextjs, OpenSearch, RAG system, React, S3, architectural decisions, cosine similarity, cost savings, embeddings, in-memory search, infrastructure, knowledge base, rate limiting, vector database

rag

stephaniespanjian.com 13 days ago

3217. HN Ask HN: Is RAG an antipattern for AI agents?

The text examines whether Retrieval-Augmented Generation (RAG) frameworks, which involve creating custom pipelines and selecting embedding models for document retrieval, might be inefficient or outdated for AI agents. The author suggests an alternative method centered around leveraging file reading capabilities inherent in agent frameworks. This approach involves organizing documents within a directory structure where virtual files represent search queries; accessing specific context simply requires reading the query's filename from this virtual setup, thus bypassing the need for custom tools or vector store APIs. The proposed system utilizes markdown parsing via markitdown, SQLite for vector similarity searches (using sqlite-vss), and a virtual filesystem interface to streamline document retrieval. The author questions whether this strategy is an established solution or if it effectively mitigates inefficiencies associated with traditional RAG frameworks. Expressing interest in publicizing the development if there is sufficient demand, they mention potential sharing through @r_klosowski on X, signaling openness to community feedback and engagement. Keywords: #phi4, AI agents, RAG, antipattern, context search, document retrieval, drive mount, embedding model, file reading, filesystem layer, markitdown, pipeline, query, sqlite-vss, vector store, virtual directory

rag

news.ycombinator.com 14 days ago

3219. HN Ask HN: Replacing RAG pipelines with a filesystem interface for AI agents

The text presents a novel approach aimed at simplifying AI agent projects by replacing traditional Retrieval-Augmented Generation (RAG) pipelines with a filesystem interface. This method involves setting up a mounted drive at `/drive/`, featuring two distinct directories: `/drive/files/` for storing actual documents and `/drive/search/` as a virtual directory where filenames function as semantic queries. By enabling agents to retrieve relevant document chunks through straightforward file reading commands such as `cat "/drive/search/refund policy enterprise customers"`, this approach seeks to eliminate the need for custom RAG tools, thereby reducing context costs significantly. Key technologies supporting this framework include MarkdownIt for conversion purposes, SQLite-Vector Similarity Search (SQLite-VSS) for vector search functionalities, and a virtual filesystem layer that unifies these components. The author is soliciting feedback on whether this approach effectively resolves existing challenges or if it introduces undue complexity into the process. Should there be adequate interest, detailed implementation plans will be shared on GitHub, with ongoing updates to be provided via social media platforms. This proposal emphasizes streamlining processes and reducing overhead in AI agent projects by leveraging a filesystem-based interface. Keywords: #phi4, AI agents, RAG pipelines, documents, embedding model, filesystem interface, markdown conversion, retrieval logic, semantic query, sqlite-vss, vector search, vector store, virtual directory, virtual filesystem layer, virtual filesystem layer Keywords: AI agents

rag

news.ycombinator.com 14 days ago

ScraperSpider

Scraper
Spider