Engram is a decentralized AI memory layer built on Bittensor subnet 450. It lets you store text, images, and PDFs as content-addressed vector embeddings that are replicated across a network of incentivized miners and permanently archived on Arweave.

How does Engram differ from Pinecone or Weaviate?

Unlike centralised vector databases, Engram has no single point of failure or central authority. Data is stored across multiple miners on the Bittensor blockchain, each of whom must cryptographically prove they hold your data. There is no monthly subscription — miners are paid in TAO tokens by the network.

What is a content-addressed CID?

A CID (content identifier) is a SHA-256 hash derived from the vector embedding and metadata of your stored content. The same text always produces the same CID, regardless of which miner stores it — making storage verifiable and tamper-proof.

How do I store data on Engram?

You can use the Python SDK (pip install engram-subnet), the CLI, or the web playground at theengram.space/playground. Each method returns a permanent CID you can use to retrieve the data later.

What is Bittensor subnet 450?

Subnet 450 is Engram's slot on the Bittensor network. Bittensor is a decentralized machine learning network where miners and validators earn TAO tokens for running useful AI services. Engram's subnet uses those incentives to ensure permanent, verifiable vector storage.

Python SDK

EngramClient is a lightweight HTTP client for a single Engram miner. Store text, images, PDFs, URLs, and conversations. Query with metadata filters, retrieve by CID, delete, and list records. No extra dependencies for text; pypdf needed for PDFs.

Install

bash

pip install engram-subnet
# For PDF support
pip install engram-subnet pypdf

EngramClient

python

from engram.sdk import EngramClient
client = EngramClient(
    miner_url="http://72.62.2.34:8091",   # or use from_subnet() for auto-discovery
    timeout=30.0,
)

Parameter	Type	Default	Description
`miner_url`	str	"http://127.0.0.1:8091"	Base URL of the miner's HTTP server
`timeout`	float	30.0	Request timeout in seconds
`namespace`	str \| None	None	Private collection name — enables encryption
`namespace_key`	str \| None	None	Secret key for the namespace (min 16 chars)

from_subnet()

Auto-discovers the best available miner from the Bittensor metagraph. Probes the top miners by incentive score in parallel and returns a client pointed at the fastest responsive one.

python

# One line — no miner URL needed
client = EngramClient.from_subnet(netuid=450)

Parameter	Type	Default	Description
`netuid`	int	450	Subnet UID to query
`network`	str	"finney"	Subtensor network — "finney", "test", or ws:// endpoint
`timeout`	float	30.0	Timeout for the returned client
`probe_timeout`	float	3.0	Timeout for each health probe during discovery
`top_n`	int	5	Number of top miners to probe (picks by incentive rank)

Note

Requires bittensor to be installed. Raises RuntimeError if no miners are reachable.

Private namespaces

Pass namespace and namespace_key to store data in an encrypted, private collection. Text is encrypted with AES-256-GCM client-side before being sent to any miner.

python

private = EngramClient(
    "http://miner:8091",
    namespace="company-docs",
    namespace_key="your-secret-key-min-16-chars",
)
cid = private.ingest("Q4 revenue was $4.2M")  # encrypted before leaving your machine
results = private.query("revenue figures")      # decrypted client-side

See Private Namespaces for the full encryption spec and threat model.

ingest()

python

cid: str = client.ingest(text: str, metadata: dict = None)

Embed and store text on the miner. Returns a CID string.

python

cid = client.ingest(
    "BERT uses bidirectional encoder representations.",
    metadata={"source": "arxiv", "year": "2018"}
)
print(cid)  # v1::a3f2b1c4d5e6f7...

Parameter	Type	Description
`text`	str	Text to embed and store (max 8192 chars)
`metadata`	dict \| None	Optional key-value metadata (max 4 KB JSON)

Raises: MinerOfflineError, IngestError, InvalidCIDError

ingest_image()

Describe an image with Grok Vision (xAI) and store the description as a searchable memory. The raw image bytes are never sent to the miner — only the AI-generated description is embedded and stored. A content_cid (SHA-256 of the image) is stored as metadata for integrity verification.

python

result = client.ingest_image(
    "photo.jpg",                      # path, or raw bytes
    xai_api_key="xai-...",            # get one at console.x.ai
    metadata={"user_id": "u_123"},    # optional extra metadata
)
print(result["cid"])          # v1::a3f2b1... — use this for search
print(result["description"])  # "A photograph of a whiteboard showing..."
print(result["content_cid"])  # sha256:abc123... — integrity check
print(result["filename"])     # "photo.jpg"
# Search by what's in the image later:
results = client.query("whiteboard diagram with architecture")

Parameter	Type	Description
`source`	str \| Path \| bytes	Image file path or raw bytes
`xai_api_key`	str	xAI API key for Grok Vision (required)
`mime_type`	str \| None	MIME type e.g. "image/jpeg" — auto-detected from extension if omitted
`metadata`	dict \| None	Optional extra metadata

Returns: dict with cid, description, content_cid, filename
Raises: MinerOfflineError, IngestError, RuntimeError (Grok API failure)

Note

Get a free xAI API key at console.x.ai. Grok Vision supports JPEG, PNG, GIF, and WebP.

ingest_pdf()

Extract text from a PDF and store it as a searchable memory. Requires pypdf. The full text (up to 8192 chars) is embedded; the SHA-256 of the raw PDF is stored as content_cid.

bash

pip install pypdf

python

result = client.ingest_pdf(
    "research_paper.pdf",             # path, or raw bytes
    metadata={"category": "research"},
)
print(result["cid"])          # v1::...
print(result["pages"])        # 12
print(result["chars"])        # 48293
print(result["content_cid"])  # sha256:...
# Search the PDF content later:
results = client.query("transformer attention mechanism")

Parameter	Type	Description
`source`	str \| Path \| bytes	PDF file path or raw bytes
`metadata`	dict \| None	Optional extra metadata

Returns: dict with cid, pages, chars, content_cid, filename
Raises: MinerOfflineError, IngestError, ImportError (pypdf missing), ValueError (image-only PDF)

Note

Image-only / scanned PDFs have no extractable text. Run OCR first (e.g. pytesseract) or use ingest_image() per page.

ingest_url()

Fetch a web page, strip navigation and boilerplate, and store the readable text as a memory. SSRF protection is built in — private/loopback addresses are blocked.

python

result = client.ingest_url(
    "https://arxiv.org/abs/1706.03762",
    metadata={"category": "research"},
)
print(result["cid"])    # v1::...
print(result["title"])  # "Attention Is All You Need"
print(result["chars"])  # 6842
print(result["url"])    # final URL after redirects
# Search later:
results = client.query("transformer architecture paper")

Parameter	Type	Description
`url`	str	HTTP or HTTPS URL to fetch
`metadata`	dict \| None	Optional extra metadata merged with auto-extracted title/source

Returns: dict with cid, url, title, chars
Raises: ValueError (invalid URL, private address), RuntimeError (fetch failure, no readable text)

ingest_conversation()

Store a conversation thread as individual turn memories. Each message is embedded separately so individual turns are semantically searchable. A shared session_id links them.

python

messages = [
    {"role": "user",      "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user",      "content": "Tell me more about Paris."},
]
cids = client.ingest_conversation(
    messages,
    session_id="session_abc123",
    metadata={"user_id": "u_456"},
)
print(cids)
# ["v1::a3f2...", "v1::b2e8...", "v1::c9f4..."]
# Retrieve conversation turns later:
results = client.query("capital city France")
# Returns the turn that mentioned Paris

Parameter	Type	Description
`messages`	list[dict]	List of {"role": ..., "content": ...} dicts
`session_id`	str	Shared ID linking all turns — stored as metadata
`metadata`	dict \| None	Optional extra metadata added to every turn

Returns: list of CID strings — one per message turn

Note

Filters empty messages automatically. Each stored record includes role, session_id, turn, and timestamp in its metadata.

query()

python

results: list[dict] = client.query(text: str, top_k: int = 10, filter: dict = None)

Semantic search over stored embeddings — works across text, images, PDFs, URLs, and conversation turns.

python

# Basic search
results = client.query("how does self-attention work?", top_k=10)
# [
#   {"cid": "v1::a3f2b1...", "score": 0.9821, "metadata": {"source": "arxiv"}},
#   {"cid": "v1::b2e8c1...", "score": 0.8847, "metadata": {"type": "url"}},
# ]
# Filter by metadata — AND semantics (all conditions must match)
results = client.query(
    "revenue figures",
    top_k=5,
    filter={"user_id": "u_123", "type": "text"},
)
# Only conversation turns for a specific session
turns = client.query(
    "Paris",
    filter={"session_id": "session_abc123", "role": "assistant"},
)

Parameter	Type	Description
`text`	str	Natural language query
`top_k`	int	Maximum results to return (default 10)
`filter`	dict \| None	AND-match metadata filter — all key/value pairs must match

get()

Retrieve a stored record by its CID. Returns the metadata (not the raw embedding vector).

python

record = client.get("v1::a3f2b1c4d5e6f7...")
if record:
    print(record["cid"])       # v1::a3f2b1...
    print(record["metadata"])  # {"source": "arxiv", "title": "Attention Is All You Need"}
else:
    print("Not found")

Returns: dict with cid and metadata, or None if not found

delete()

Remove a stored record by its CID. The operation is idempotent.

python

deleted = client.delete("v1::a3f2b1c4d5e6f7...")
print(deleted)  # True if it existed, False if not found

Returns: bool — True if deleted, False if CID was not found
Raises: MinerOfflineError

list()

List stored records with optional metadata filtering and pagination.

python

# All records (first page)
records = client.list(limit=50, offset=0)
# Filter by type
image_records = client.list(filter={"type": "image"})
# All memories for a user, paginated
page1 = client.list(filter={"user_id": "u_123"}, limit=20, offset=0)
page2 = client.list(filter={"user_id": "u_123"}, limit=20, offset=20)
for r in page1:
    print(r["cid"], r["metadata"].get("title", ""))

Parameter	Type	Description
`filter`	dict \| None	AND-match metadata filter
`limit`	int	Max records per page (default 50)
`offset`	int	Records to skip (default 0)

Returns: list of dicts with cid and metadata

batch_ingest_file()

Ingest all records from a JSONL file. Each line must be a JSON object with a text key.

python

# data.jsonl format:
# {"text": "First entry"}
# {"text": "Second entry", "metadata": {"category": "ml"}}
cids = client.batch_ingest_file("data/corpus.jsonl")
print(f"Ingested {len(cids)} records")
# With error tracking
cids, errors = client.batch_ingest_file("corpus.jsonl", return_errors=True)
for err in errors:
    print(f"Skipped: {err}")

health() / is_online()

python

# Check liveness — raises MinerOfflineError if unreachable
info = client.health()
# {"status": "ok", "vectors": 42156, "uid": 7}
# Safe check — never raises
if client.is_online():
    cid = client.ingest("...")

Multi-miner pattern

For redundancy, ingest to multiple miners. The same text always produces the same CID.

python

from engram.sdk import EngramClient, MinerOfflineError
miners = [
    EngramClient("http://miner1:8091"),
    EngramClient("http://miner2:8091"),
    EngramClient("http://miner3:8091"),
]
cids = []
for miner in miners:
    try:
        cids.append(miner.ingest("Critical knowledge."))
    except MinerOfflineError:
        print(f"Miner offline: {miner.miner_url}")
print(f"Stored on {len(cids)}/3 miners")

Note

The same text always produces the same CID across every miner — CIDs are content-addressed, not location-addressed.

Quick Start

Private Namespaces

engram docs · v0.1edit on github →