Benchmarks¶
Performance benchmarks for Barrel DocDB operations using the built-in benchmark suite.
Test Environment
- Hardware: Apple M1, 16GB RAM, SSD
- Erlang/OTP: 27
- Dataset: 5,000 documents (~500 bytes each)
- Database: Single node, default configuration
CRUD Operations¶
Basic document operations show strong single-document performance:
| Operation | Throughput | p50 Latency | p99 Latency |
|---|---|---|---|
| Insert | 5,785 ops/s | 147 us | 252 us |
| Read | 111,111 ops/s | 8 us | 22 us |
| Update | 118,343 ops/s | 8 us | 21 us |
| Delete | 5,660 ops/s | 8 us | 20 us |
Key observations:
- Single-document reads are very fast (8 us median)
- Updates are read-modify-write cycles with excellent latency
- Writes include RocksDB sync and indexing overhead
- Bulk inserts can achieve higher throughput with batching
Query Performance¶
Query performance varies based on query pattern and result set size.
Index-Only Queries¶
These queries use include_docs => false and benefit from pure index scans:
| Query Type | Throughput | p50 Latency | Notes |
|---|---|---|---|
| Prefix with LIMIT 10 | 39,354 ops/s | 23 us | Autocomplete use case |
| Prefix (all matches) | 44,033 ops/s | 18 us | ~1,111 docs returned |
| Simple equality + LIMIT 10 | 27,800 ops/s | 27 us | Early termination |
| Pure compare + LIMIT 10 | 6,222 ops/s | 157 us | Range scan with limit |
| Pure compare (age>50) | 2,101 ops/s | 459 us | ~2,380 docs returned |
| Pure Top-K (ORDER BY + LIMIT) | 3,889 ops/s | 233 us | No filter, just sort |
| Selective equality | 488 ops/s | 2.0 ms | ~1,666 docs (1/3) |
| Nested path | 55 ops/s | 18 ms | Nested object access |
| Top-K with filter | 82 ops/s | 12 ms | ORDER BY + LIMIT + filter |
Paginated Queries¶
Paginated queries with continuation tokens deliver excellent performance:
| Page Size | Throughput | p50 Latency |
|---|---|---|
| 100 docs/page | 4,906 pages/s | 187 us |
| 500 docs/page | 3,163 pages/s | 347 us |
Pagination is Essential
Always paginate large result sets. Fetching thousands of documents in a single query is slow and memory-intensive. Use limit and continuation tokens to stream results efficiently:
%% First page
{ok, Results, #{continuation := Token}} =
barrel_docdb:find(Db, #{where => Query, limit => 100}).
%% Next pages
{ok, More, #{continuation := NextToken}} =
barrel_docdb:find(Db, #{where => Query, limit => 100, continuation => Token}).
Paginated queries run in microseconds per page vs hundreds of milliseconds for unbounded queries.
Changes Feed¶
| Operation | Throughput | p50 Latency | Notes |
|---|---|---|---|
| Full scan (5K docs) | - | 98 ms | One-time scan |
| Incremental (100/batch) | 2,857 batches/s | 319 us | Continuous polling |
| Subscription notification | 13,823 ops/s | 65 us | Pub/sub latency |
Subscription latency of 65 us is excellent for real-time applications.
Architecture: ARS Model¶
Barrel DocDB is built on an ARS (Append, Reduce, Stream) storage model:
- Append: All writes are append-only with MVCC versioning
- Reduce: Indexes are materialized views that reduce over the log
- Stream: Changes feed provides a replayable event stream
This architecture enables:
- Flexible query engines: The storage layer is decoupled from query execution. Different query engines can be built on top of the same storage.
- Custom databases: The ARS model can support different data models (document, graph, time-series) with the same underlying storage.
- Efficient replication: Append-only logs are naturally suited for P2P sync.
- Time-travel queries: MVCC enables querying historical states.
The benchmark numbers reflect the current document query engine. Future query engines could optimize for different access patterns (e.g., graph traversal, analytics) while reusing the same storage layer.
When to Use Barrel DocDB¶
Best Use Cases¶
| Use Case | Why |
|---|---|
| P2P Replication | Built-in chain, group, fanout patterns |
| Real-time subscriptions | 65 us notification latency |
| Prefix/autocomplete | 18-23 us query latency |
| Edge computing | Embedded in Erlang, sync when online |
| MVCC conflict detection | Revision tracking built-in |
| Paginated APIs | Fast continuation-based pagination |
Trade-offs¶
When other tools may be better:
| If you need... | Consider instead |
|---|---|
| Maximum single-node query speed | SQLite with JSON1, DuckDB |
| Simple key-value access | Direct RocksDB, ETS |
| Complex SQL analytics | PostgreSQL, DuckDB |
| Full-text search | Meilisearch, Elasticsearch |
Honest Comparison¶
For single-node performance only, other databases will often be faster:
- SQLite: 3-10x faster for complex queries, excellent JSON support
- RocksDB direct: 2-5x faster for raw key-value access
- Mnesia: Native Erlang, different consistency model
Barrel's value is in the combination of:
- Embedded Erlang integration
- Document model with automatic indexing
- P2P replication topologies
- Real-time change subscriptions
- MVCC for conflict-free sync
- ARS architecture for future extensibility
If you only need single-node storage without replication, simpler tools exist. Choose Barrel when you need sync, real-time, and architectural flexibility.
HTTP API vs Direct Erlang API¶
The HTTP API provides remote access but adds overhead from network I/O, JSON serialization, and authentication.
Regular Database (Non-Sharded)¶
| Operation | Direct API (ops/s) | HTTP API (ops/s) | HTTP Overhead |
|---|---|---|---|
| Insert | 8,223 | 1,206 | 85% |
| Read | 18,900 | 790 | 96% |
| Update | 38,066 | 1,124 | 97% |
Virtual Database (VDB, 4 Shards)¶
| Operation | Direct API (ops/s) | HTTP API (ops/s) | HTTP Overhead |
|---|---|---|---|
| Insert | 2,786 | 838 | 70% |
| Read | 991 | 370 | 63% |
| Update | 1,019 | 425 | 58% |
| Query | 1,226 | 703 | 43% |
HTTP API Usage Guidelines
Use the direct Erlang API when possible. The HTTP API is best for:
- Cross-language access (Python, JavaScript, etc.)
- External service integration
- Administrative operations from CLI tools
- Distributed deployments where HTTP is required
For Erlang applications running on the same node, always prefer the direct API (100-200x faster for simple operations).
Connection Pooling¶
The benchmarks above use single connections. In production, use HTTP connection pooling:
%% With hackney pool
Options = [{pool, my_barrel_pool}],
hackney:request(get, Url, Headers, Body, Options).
Connection pooling can improve HTTP throughput by 3-5x by reusing TCP connections.
Virtual Database (VDB) Performance¶
VDB provides automatic sharding for horizontal scalability. There is overhead from shard routing and scatter-gather queries.
VDB vs Non-Sharded (4 Shards)¶
| Operation | Non-Sharded (ops/s) | VDB (ops/s) | VDB Overhead |
|---|---|---|---|
| Insert | 7,176 | 5,827 | 19% |
| Read | 4,381 | 3,647 | 17% |
| Update | 4,831 | 4,130 | 15% |
VDB Query Performance¶
Cross-shard queries use scatter-gather:
| Query Type | Throughput | p50 Latency | p99 Latency |
|---|---|---|---|
| Simple equality (LIMIT 100) | 1,226 ops/s | 815 us | 1,200 us |
| Multi-condition (LIMIT 100) | 1,100 ops/s | 910 us | 1,400 us |
| Full scan (no limit) | 85 ops/s | 11.7 ms | 15.2 ms |
VDB Scaling
VDB overhead is ~15-20% for single-document operations. The benefit comes from:
- Horizontal scalability across nodes
- Parallel query execution across shards
- Automatic data distribution
For single-node deployments, prefer non-sharded databases unless you anticipate scaling out.
Running Benchmarks¶
Run the benchmark suite on your hardware:
cd bench
./run_bench.sh # Default: 10,000 docs, 10,000 iterations
./run_bench.sh 5000 100 # Custom: 5,000 docs, 100 iterations
./run_bench.sh vdb 5000 100 # VDB vs non-sharded comparison
./run_bench.sh http 1000 100 # HTTP API vs Direct API comparison
Or from Erlang:
barrel_bench:run(#{num_docs => 5000, iterations => 100}).
%% Run specific workloads
barrel_bench:run_crud(#{num_docs => 1000}).
barrel_bench:run_query(#{num_docs => 5000, iterations => 100}).
barrel_bench:run_changes(#{num_docs => 1000}).
barrel_bench:run_vdb(#{num_docs => 5000, iterations => 100}).
barrel_bench:run_http(#{num_docs => 1000, iterations => 100}).
Results are saved to bench/results/ as JSON with timestamps.
Query Building Guidelines¶
Building efficient queries is crucial for performance. See the Query Guide for full syntax reference.
Rule 1: Always Paginate¶
Never fetch unbounded result sets. Use limit and continuation tokens:
%% BAD: Fetches all matching documents
{ok, All, _} = barrel_docdb:find(Db, #{where => Query}).
%% GOOD: Paginate with continuation
{ok, Page, #{continuation := Token}} =
barrel_docdb:find(Db, #{where => Query, limit => 100}).
Rule 2: Use Index-Only Queries When Possible¶
Set include_docs => false to skip document body fetches:
%% Returns only doc IDs - 18-27 us
{ok, Ids, _} = barrel_docdb:find(Db, #{
where => [{path, [<<"type">>], <<"user">>}],
include_docs => false,
limit => 100
}).
%% Then fetch specific docs you need
{ok, Doc} = barrel_docdb:get_doc(Db, hd(Ids)).
Rule 3: Put Selective Conditions First¶
More selective conditions reduce the search space:
%% GOOD: Most selective condition first
#{where => [
{path, [<<"user_id">>], <<"specific_user">>}, %% Very selective
{path, [<<"type">>], <<"event">>} %% Less selective
]}
%% LESS OPTIMAL: Broad condition first
#{where => [
{path, [<<"type">>], <<"event">>}, %% Matches many docs
{path, [<<"user_id">>], <<"specific_user">>}
]}
Rule 4: Prefer Prefix Over Regex¶
Prefix queries use efficient index range scans:
%% FAST: 18-23 us - Uses index range scan
{prefix, [<<"name">>], <<"John">>}
%% SLOW: Full scan with regex matching
{regex, [<<"name">>], <<"^John.*">>}
Rule 5: Use LIMIT with ORDER BY¶
Top-K queries are fast when limited:
%% FAST: 233 us - Early termination
#{where => [],
order_by => {[<<"created_at">>], desc},
limit => 10}
%% SLOW: Must sort all documents first
#{where => [],
order_by => {[<<"created_at">>], desc}}
Query Execution Strategies¶
Use explain/2 to see how queries execute:
| Strategy | Performance | When Used |
|---|---|---|
index_seek |
Excellent | Equality on indexed path |
index_scan |
Good | Range queries, prefix |
multi_index |
Good | Multiple conditions |
full_scan |
Avoid | No index available |
Anti-Patterns to Avoid¶
| Pattern | Problem | Solution |
|---|---|---|
| No LIMIT | Fetches entire database | Always paginate |
| OR with many terms | Creates large unions | Consider multiple queries |
| NOT on large sets | Scans exclusions | Restructure query |
| Regex for prefix | Full scan | Use {prefix, ...} |
| Sorting without limit | Sorts all results | Add LIMIT |
Optimizing Performance¶
Write Optimization¶
- Batch writes for bulk inserts
- Disable sync for non-critical writes:
#{sync => false} - Use specific paths in change subscriptions vs wildcards
Configuration Tuning¶
See Architecture for RocksDB tuning:
- Adjust block cache size for your memory budget
- Configure write buffer size for write-heavy workloads
- Enable compression for large documents