Virtual Databases (VDB)¶

Virtual Databases (VDBs) provide automatic sharding for horizontal scalability. Documents are distributed across multiple physical databases (shards) based on consistent hashing of document IDs.

VDB is an optional layer - the existing /db/:db API remains unchanged. Use VDBs when you need to scale beyond a single database instance.

Architecture¶

                         ┌─────────────────────────────────────┐
                         │           HTTP API                   │
                         ├──────────────────┬──────────────────┤
                         │  /db/:db         │  /vdb/:vdb       │
                         │  (direct)        │  (sharded)       │
                         └────────┬─────────┴────────┬─────────┘
                                  │                  │
                                  │         ┌───────▼────────┐
                                  │         │  barrel_vdb    │
                                  │         │  (router)      │
                                  │         └───────┬────────┘
                                  │                 │
                         ┌────────▼─────────────────▼─────────┐
                         │         barrel_docdb (unchanged)    │
                         │  ┌─────────┐ ┌─────────┐ ┌────────┐│
                         │  │users_s0 │ │users_s1 │ │ mydb   ││
                         │  └─────────┘ └─────────┘ └────────┘│
                         └─────────────────────────────────────┘

Key principles:

VDB is optional - existing /db/:db API unchanged
No changes to barrel_docdb core modules
Physical shard DBs are regular barrel databases
Documents routed by consistent hash of document ID

Quick Start¶

Create a VDB¶

HTTP APIErlang API

# Create a VDB with 4 shards
curl -X POST "http://localhost:8080/vdb" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"name": "users", "shard_count": 4}'

%% Create a VDB with 4 shards
ok = barrel_vdb:create(<<"users">>, #{shard_count => 4}).

Store and Retrieve Documents¶

Documents are automatically routed to the correct shard based on their ID:

HTTP APIErlang API

# Put a document (routed to correct shard)
curl -X PUT "http://localhost:8080/vdb/users/user123" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"name": "Alice", "email": "alice@example.com"}'

# Get a document (routed to correct shard)
curl "http://localhost:8080/vdb/users/user123" \
  -H "Authorization: Bearer $API_KEY"

# Delete a document
curl -X DELETE "http://localhost:8080/vdb/users/user123" \
  -H "Authorization: Bearer $API_KEY"

%% Put a document
{ok, #{<<"id">> := DocId, <<"rev">> := Rev}} = barrel_vdb:put_doc(<<"users">>, #{
    <<"id">> => <<"user123">>,
    <<"name">> => <<"Alice">>,
    <<"email">> => <<"alice@example.com">>
}).

%% Get a document
{ok, Doc} = barrel_vdb:get_doc(<<"users">>, <<"user123">>).

%% Delete a document
{ok, _} = barrel_vdb:delete_doc(<<"users">>, <<"user123">>).

Query Across Shards¶

Queries use scatter-gather to search all shards and merge results:

HTTP APIErlang API

# Find all active users
curl -X POST "http://localhost:8080/vdb/users/_find" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "where": [{"path": ["active"], "value": true}],
    "limit": 100
  }'

%% Find all active users
{ok, Results, Meta} = barrel_vdb:find(<<"users">>, #{
    where => [{path, [<<"active">>], true}],
    limit => 100
}).

HTTP API Reference¶

VDB Management¶

Method	Endpoint	Description
`POST`	`/vdb`	Create a new VDB
`GET`	`/vdb`	List all VDBs
`GET`	`/vdb/:vdb`	Get VDB info
`DELETE`	`/vdb/:vdb`	Delete a VDB
`GET`	`/vdb/:vdb/_shards`	Get shard assignments
`GET`	`/vdb/:vdb/_replication`	Get replication status
`POST`	`/vdb/:vdb/_import`	Import from regular database
`POST`	`/vdb/:vdb/_shards/:shard/_split`	Split a shard
`POST`	`/vdb/:vdb/_shards/:shard/_merge`	Merge two shards

Document Operations¶

Method	Endpoint	Description
`GET`	`/vdb/:vdb/:doc_id`	Get document (routed)
`PUT`	`/vdb/:vdb/:doc_id`	Put document (routed)
`DELETE`	`/vdb/:vdb/:doc_id`	Delete document (routed)
`POST`	`/vdb/:vdb/_bulk_docs`	Bulk operations

Query Operations¶

Method	Endpoint	Description
`POST`	`/vdb/:vdb/_find`	Query across all shards
`GET`	`/vdb/:vdb/_changes`	Merged changes feed

Creating a VDB¶

Basic Creation¶

curl -X POST "http://localhost:8080/vdb" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"name": "orders", "shard_count": 8}'

With Replication¶

Create a VDB with automatic replication across zones:

curl -X POST "http://localhost:8080/vdb" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "name": "users",
    "shard_count": 4,
    "placement": {
      "replica_factor": 2,
      "zones": ["us-east", "eu-west"]
    }
  }'

Creation Options¶

Option	Type	Default	Description
`name`	string	(required)	VDB name
`shard_count`	integer	4	Number of shards
`hash_function`	string	"phash2"	Hash function ("phash2" or "xxhash")
`placement.replica_factor`	integer	1	Number of replicas per shard
`placement.zones`	array	[]	Preferred zones for placement

VDB Info¶

Get detailed information about a VDB:

curl "http://localhost:8080/vdb/users" \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "name": "users",
  "shard_count": 4,
  "total_docs": 10000,
  "total_disk_size": 52428800,
  "shards": [
    {"db": "users_s0", "doc_count": 2500, "disk_size": 13107200},
    {"db": "users_s1", "doc_count": 2500, "disk_size": 13107200},
    {"db": "users_s2", "doc_count": 2500, "disk_size": 13107200},
    {"db": "users_s3", "doc_count": 2500, "disk_size": 13107200}
  ]
}

Shard Assignments¶

View shard assignments and status:

curl "http://localhost:8080/vdb/users/_shards" \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "shards": [
    {
      "shard_id": 0,
      "primary": "node1@localhost",
      "replicas": ["node2@localhost"],
      "status": "active"
    },
    {
      "shard_id": 1,
      "primary": "node2@localhost",
      "replicas": ["node1@localhost"],
      "status": "active"
    }
  ]
}

Shard Statuses¶

Status	Description
`active`	Normal operation
`splitting`	Shard is being split
`merging`	Shard is being merged
`migrating`	Data is being moved
`readonly`	Read-only mode

Bulk Operations¶

Write multiple documents in a single request:

curl -X POST "http://localhost:8080/vdb/users/_bulk_docs" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "docs": [
      {"id": "user1", "name": "Alice"},
      {"id": "user2", "name": "Bob"},
      {"id": "user3", "name": "Charlie"}
    ]
  }'

Response:

{
  "results": [
    {"id": "user1", "ok": true, "rev": "1-abc123"},
    {"id": "user2", "ok": true, "rev": "1-def456"},
    {"id": "user3", "ok": true, "rev": "1-ghi789"}
  ]
}

Changes Feed¶

Get merged changes from all shards:

# Get changes since beginning
curl "http://localhost:8080/vdb/users/_changes?since=first" \
  -H "Authorization: Bearer $API_KEY"

# Get changes with limit
curl "http://localhost:8080/vdb/users/_changes?since=first&limit=100" \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "changes": [
    {"id": "user1", "seq": 1, "deleted": false},
    {"id": "user2", "seq": 2, "deleted": false}
  ],
  "last_seq": 2
}

Data Import¶

Import documents from a regular database into a VDB. Documents are automatically distributed across shards based on their IDs.

Import All Documents¶

HTTP APIErlang API

curl -X POST "http://localhost:8080/vdb/users/_import" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"source_db": "legacy_users"}'

{ok, Stats} = barrel_vdb_import:import(<<"legacy_users">>, <<"users">>, #{}).

Response:

{
  "docs_read": 10000,
  "docs_written": 9950,
  "docs_skipped": 50,
  "errors": 0,
  "started_at": 1736712345000,
  "finished_at": 1736712400000,
  "status": "completed"
}

Import with Filter¶

Import only specific documents:

HTTP APIErlang API

curl -X POST "http://localhost:8080/vdb/users/_import" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "source_db": "legacy_data",
    "filter": {
      "where": [{"path": ["type"], "value": "user"}]
    },
    "batch_size": 500,
    "on_conflict": "skip"
  }'

{ok, Stats} = barrel_vdb_import:import(<<"legacy_data">>, <<"users">>, #{
    filter => #{where => [{path, [<<"type">>], <<"user">>}]},
    batch_size => 500,
    on_conflict => skip
}).

Import Options¶

Option	Type	Default	Description
`source_db`	string	(required)	Source database name
`filter`	object	-	Filter documents (same format as `_find`)
`batch_size`	integer	100	Documents per batch
`on_conflict`	string	`skip`	Conflict handling: `skip`, `overwrite`, `merge`

Replication Status¶

Check replication status for a VDB:

curl "http://localhost:8080/vdb/users/_replication" \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "vdb_name": "users",
  "replica_factor": 2,
  "shard_count": 4,
  "shards": [
    {
      "shard_id": 0,
      "replication_tasks": [
        {"source": "users_s0", "target": "http://node2:8080/db/users_s0", "status": "active"}
      ]
    }
  ]
}

Erlang API Reference¶

VDB Lifecycle¶

%% Create a VDB
ok = barrel_vdb:create(VdbName, Opts).
%% Opts: #{shard_count => 4, placement => #{replica_factor => 2}}

%% Check if VDB exists
true = barrel_vdb:exists(VdbName).

%% List all VDBs
{ok, [<<"users">>, <<"orders">>]} = barrel_vdb:list().

%% Get VDB info
{ok, Info} = barrel_vdb:info(VdbName).

%% Delete a VDB
ok = barrel_vdb:delete(VdbName).

Document Operations¶

%% Put document (ID auto-generated if not provided)
{ok, #{<<"id">> := DocId, <<"rev">> := Rev}} = barrel_vdb:put_doc(VdbName, Doc).

%% Put with specific revision (for updates)
{ok, Result} = barrel_vdb:put_doc(VdbName, DocId, Doc#{<<"_rev">> => Rev}).

%% Get document
{ok, Doc} = barrel_vdb:get_doc(VdbName, DocId).

%% Get with options
{ok, Doc} = barrel_vdb:get_doc(VdbName, DocId, #{include_revs => true}).

%% Delete document
{ok, Result} = barrel_vdb:delete_doc(VdbName, DocId).

Query Operations¶

%% Find documents (scatter-gather across all shards)
{ok, Results, Meta} = barrel_vdb:find(VdbName, #{
    where => [{path, [<<"type">>], <<"user">>}],
    limit => 100
}).

%% Get changes (merged from all shards)
{ok, Changes} = barrel_vdb:get_changes(VdbName, first).
{ok, Changes} = barrel_vdb:get_changes(VdbName, Since, #{limit => 100}).

%% Fold over all documents
{ok, Result} = barrel_vdb:fold_docs(VdbName, Fun, Acc, Opts).

Bulk Operations¶

%% Bulk write documents
Docs = [
    #{<<"id">> => <<"doc1">>, <<"value">> => 1},
    #{<<"id">> => <<"doc2">>, <<"value">> => 2}
],
{ok, Results} = barrel_vdb:bulk_docs(VdbName, Docs).

Shard Rebalancing¶

When shards become unbalanced, you can split or merge them.

Split a Shard¶

Split a large shard into two. The original shard keeps the lower half of its hash range, and a new shard is created for the upper half.

HTTP APIErlang API

# Split shard 0 into two shards
curl -X POST "http://localhost:8080/vdb/users/_shards/0/_split" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY"

# With batch size option
curl -X POST "http://localhost:8080/vdb/users/_shards/0/_split" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"batch_size": 500}'

Response:

{
  "ok": true,
  "new_shard_id": 4,
  "message": "Shard 0 split into shards 0 and 4"
}

%% Split shard 0 into two shards
{ok, NewShardId} = barrel_shard_rebalance:split_shard(<<"users">>, 0).

%% Split with progress callback
ProgressFun = fun(#{phase := Phase, migrated := N, total := T}) ->
    io:format("~p: ~p/~p~n", [Phase, N, T])
end,
{ok, NewShardId} = barrel_shard_rebalance:split_shard(<<"users">>, 0, #{
    progress_callback => ProgressFun,
    batch_size => 100
}).

Merge Shards¶

Merge two adjacent shards. Only shards that are adjacent in hash space can be merged.

HTTP APIErlang API

# Merge shard 2 into shard 3
curl -X POST "http://localhost:8080/vdb/users/_shards/2/_merge" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"target_shard": 3}'

# With batch size option
curl -X POST "http://localhost:8080/vdb/users/_shards/2/_merge" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"target_shard": 3, "batch_size": 500}'

Response:

{
  "ok": true,
  "message": "Shard 2 merged into shard 3"
}

Error (shards not adjacent):

{
  "error": "Shards are not adjacent in hash space"
}

%% Check if shards can be merged
{ok, true} = barrel_shard_rebalance:can_merge(<<"users">>, 0, 1).

%% Merge shards 0 and 1
ok = barrel_shard_rebalance:merge_shards(<<"users">>, 0, 1).

Estimate Migration¶

Before splitting/merging, estimate the work involved:

%% Estimate documents to migrate
{ok, DocCount} = barrel_shard_rebalance:estimate_migration(<<"users">>, 0, 1).

Best Practices¶

Shard Count Selection¶

Start small: Begin with 4-8 shards
Plan for growth: Choose a power of 2 (4, 8, 16, 32)
Consider document count: ~1M docs per shard is reasonable
Monitor disk usage: Split shards when they exceed 10GB

Document ID Design¶

Since documents are routed by ID hash:

Use UUIDs or random IDs for even distribution
Avoid sequential IDs (doc1, doc2, doc3) which may cluster
Consider composite IDs (user:123:order:456) for related documents

Query Optimization¶

Use specific selectors to reduce scatter-gather overhead
Add limits to prevent large result sets
Consider denormalization to avoid cross-shard joins

Troubleshooting¶

Uneven Shard Distribution¶

Check document counts per shard:

curl "http://localhost:8080/vdb/users" -H "Authorization: Bearer $API_KEY" | jq '.shards'

If distribution is uneven: 1. Check document ID patterns 2. Consider splitting large shards 3. Ensure hash function is consistent

Replication Lag¶

Check replication status:

curl "http://localhost:8080/vdb/users/_replication" -H "Authorization: Bearer $API_KEY"

If tasks show "error" status: 1. Verify network connectivity between nodes 2. Check authentication configuration 3. Review node logs for errors

Query Performance¶

For slow queries:

Check if query can be satisfied by fewer shards
Add appropriate indexes on shard databases
Consider query result caching at application level