Self-hosting a RAG pipeline
Learn how to self-host the nodes needed to run a RAG pipeline
This guide will show you how to set up a RAG pipeline on your own hardware, you will need a pool already configured and running, see the Running a Pool for more information.
The RAG pipeline requires the following components:
- Document retrieval node: A node that fetches and transforms document into markdown from the web
- Embedding node: A node that converts documents and queries into searchable vectors
- Search node : A node that creates indexes from the vectors and performs similarity searches
- Extism Runtime : A general purpose node that runs Extism plugins. We will use it to run the RAG coordinator plugin
- The RAG Coordinator plugin: A plugin that coordinates the other nodes and provides a single request endpoint for the pipeline
Requirements
- A pool running on the same network. See the Running a Pool guide for more information.
- An host with docker installed and configured (could be the same host as the pool).
- Basic knowledge on how to use a terminal and docker
Starting the retrieval node
Replace POOL_ADDRESS
, POOL_PORT
with the correct values for your pool.
If the pool is using SSL set POOL_SSL
to true
You can check the logs of the container with docker logs --follow retrieval
to see if it started correctly.
By default the node will use a random “NODE_TOKEN” to authenticate with the pool. You can provide a custom token by setting a valid nostr secret in the docker command:
Note: The token is communicated to the pool in plain text, so you should generate a new token only for this purpose.
Starting the embedding node
Using a local model
You can use any local model that is compatible with the Sentence Transformers library.
You can run the model on a Nvidia GPU by setting EMBEDDINGS_TRANSFORMERS_DEVICE
to the numeric id of a cuda device (eg. 0).
And by passing --gpus all
to the docker run command.
Ensure you have the correct drivers and the nvidia container runtime installed.
Replace POOL_ADDRESS
, POOL_PORT
with the correct values for your pool.
If the pool is using SSL set POOL_SSL
to true
You can check the logs of the container with docker logs --follow embeddings
to see if it started correctly.
By default the node will use a random “NODE_TOKEN” to authenticate with the pool. You can provide a custom token by setting a valid nostr secret in the docker command:
Note: The token is communicated to the pool in plain text, so you should generate a new token only for this purpose.
Using OpenAI
Replace POOL_ADDRESS
, POOL_PORT
with the correct values for your pool.
If the pool is using SSL set POOL_SSL
to true
You can check the logs of the container with docker logs --follow embeddings
to see if it started correctly.
By default the node will use a random “NODE_TOKEN” to authenticate with the pool. You can provide a custom token by setting a valid nostr secret in the docker command:
Note: The token is communicated to the pool in plain text, so you should generate a new token only for this purpose.
Starting the search node
Replace POOL_ADDRESS
, POOL_PORT
with the correct values for your pool.
If the pool is using SSL set POOL_SSL
to true
You can check the logs of the container with docker logs --follow search
to see if it started correctly.
By default the node will use a random “NODE_TOKEN” to authenticate with the pool. You can provide a custom token by setting a valid nostr secret in the docker command:
Note: The token is communicated to the pool in plain text, so you should generate a new token only for this purpose.
Starting the Extism Runtime
Replace POOL_ADDRESS
, POOL_PORT
with the correct values for your pool.
If the pool is using SSL set POOL_SSL
to true
You can check the logs of the container with docker logs --follow extism-runtime
to see if it started correctly.
By default the node will use a random “NODE_TOKEN” to authenticate with the pool. You can provide a custom token by setting a valid nostr secret in the docker command:
Note: The token is communicated to the pool in plain text, so you should generate a new token only for this purpose.
Getting the RAG Coordinator plugin
For this step your are going to need only the public direct link for a RAG Coordinator plugin, you can get one from the openagents-rag-coordinator-plugin’s Release page.
Testing the RAG Pipeline
If you’ve been following along, you should have the following containers running:
- retrieval
- embeddings
- search
- extism-runtime
all connected to the pool.
If that’s the case, the rag component should be ready to use.
Testing the RAG pipeline manually
You can now craft a nostr NIP-90 Job Request to run the RAG pipeline.
Kind 5003 is a custom kind we use for generic openagents jobs, this might change in the future.
Replace $currentTimeInSeconds and $currentTimeInSecondsPLUS10mins with valid values
As you can see, this event calls the RAG coordinator plugin on top of an extism-runtime node, with some inputs ( i tags) with one or more documents (as urls or plain text) marked with the “passage” marker, and a query marked with the “query” marker.
After this event is broadcasted to the relay, you will receive a Job feedback event
with status=="success"
after which you will be able to get the job results by fetching
a Job result event of kind==6003
(5003+1000)
and e
tag equivalent to the Job Request event id.
If you are using a public Nostr relay, there might be other pools listening for the same kind of job, so you might actually get the job results from someone else.
To avoid this you can run your own private relay or enforce the use of a specific pool by setting your pool’s
nostr public key in the p
tag of the Job Request.
Additionally you can also encrypt the request for the same public key, using NIP-04 as explained here.
See the NIP-90 specification for more information on the protocol flow.
Testing the RAG pipeline with our cli demo
Work in progress
This is a work in progress, the cli is not yet available.