Knowledge bases

RAG pipeline that parses, chunks, embeds, and indexes your documents

A knowledge base in Meko is a RAG (Retrieval-Augmented Generation) pipeline that ingests your documents, breaks them into chunks, generates vector embeddings, and indexes them for semantic search. Agents can then query the knowledge base to find relevant information from your documents.

How it works

When you add a knowledge base to a datapack using the Meko UI, Meko's pg_dist_rag pipeline:

  1. Fetches documents from the source (S3, local filesystem, a web page, or an NFS mount).
  2. Preprocesses to extract text from PDFs, HTML, markdown files, text files, images, parquet, iceberg, JSON, and more.
  3. Chunks the text into segments (configurable chunk size).
  4. Embeds each chunk using the configured embedding model.
  5. Indexes the embeddings in pgvector for fast similarity search.

All of this happens within your datapack's database; there's no separate vector database to manage.

Supported document formats

Meko supports documents in:

  • PDF
  • Parquet
  • Iceberg
  • JSON
  • Images
  • Video

Documents can be loaded from S3, local filesystem, or an NFS mounted directory using the Meko UI.

Query knowledge

Once indexed, agents can query the knowledge base through the MCP server. The MCP tool for knowledge search handles embedding the query, performing similarity search, and returning relevant chunks.

Next steps