It's not clear from the readme physically where the data is stored, nor where in the storage process the "congestion" is coming from.
I'm surprised there's no range scan. Range scans enable a whole swathe of functionality that make kv stores punch above their weight. I suppose that's more rocksdb/dynamo/bigtable/cassandra than redis/memcached, though.
> *"The image here appears to be upside down (or rather rotated 180)"*
Yeah, something weird happened with the image rendering; I'll fix that.
> *"It's not clear from the readme physically where the data is stored, nor where in the storage process the 'congestion' is coming from."*
The data is *fully in-memory*, distributed across dynamically growing shards (think of an adaptive hashtable that resizes itself). There’s no external storage layer like RocksDB or disk persistence—this is meant to be *pure cache-speed KV storage.*
Congestion happens when a shard starts getting too many keys relative to the rest of the system. The engine constantly tracks *contention per shard*, and when it crosses a threshold, we trigger an upgrade (new shards added, old ones redistributed). Migration is *zero-downtime*, but at very high write rates, there’s a brief moment where some writes are directed to the old store while the new one warms up.
> *"I'm surprised there's no range scan."*
Yeah, that’s an intentional design choice—this is meant to be a *high-speed cache*, closer to Redis than a full database like RocksDB or BigTable. Range queries would need an ordered structure (e.g., skip lists or B-trees), which add overhead. But I’m definitely considering implementing *prefix scans* (e.g., `SCAN user:*` style queries) since that’d be useful for a lot of real-world use cases.
This looks great, how are you doing the benchmarks? It claims to be way faster than Redis. Can you also measure it against Microsoft Garnet? Whats the secret sauce for beating in latency?
Redis:
Write Latency ~1.1ms
Read Latency ~700µs
Max Throughput ~85,000 ops/sec
Nubmq:
Write Latency 900µs
Read Latency 500µs
Max Throughput 115,809 ops/sec
The benchmarks were done on an M2 MacBook Air (8-core) with 21M requests distributed across 100 concurrent clients sending requests as fast as possible(the test script is also in the github repo),
a few reasons for being fast that come to my mind now are:
1. reads are direct lookups, so distributing that across goroutines will result in them being faster
2. it is set requests where it gets complicated, if we're simply updating some key's value, it's essentially negligible, but if we're creating a new key value pair, that can increase per shard load under scale, which would trigger an store resizing, to avoid just stopping everything when that happens, the engine recognises when the per shard load starts to gets too high, in the backgroud creates a bigger store, and then essentially switches writes from the old engine to the new(bigger) one, while the old one keeps processing reads, and the older engine migrates it's keys to the newer one in backgroud, once it's done, we just dereference the older engine to be collected by GC :) this essentially makes sure that incoming requests keep getting served (oh shit, I spilled the secret sauce)
atleast on my machine,with the default setting, under that concurrent load, Redis starts slowing down due to single-threaded execution and Lua overhead.
Redis is single-threaded for dataset-modifying commands (SET, DEL, etc.), but multi-threaded for networking and I/O. nubmq, on the other hand, is fully multi-threaded—distributing reads and writes across all cores. So yes, this is fundamentally a comparison between Go’s concurrency model and Redis’s single-threaded event loop.
For benchmarking, I used the same test script (included in the repo) for both Redis and nubmq—100 concurrent clients hammering the server with requests. The core execution models remain unchanged; only the endpoints differ. So if you're asking whether the comparison reflects actual engine throughput rather than just connection handling—yes, it does.
The real ‘secret sauce’ is how nubmq prevents bottlenecks under load. When per-shard contention spikes, it spins up a larger store in the background, shifts new writes to it, and lazily migrates keys over. No pauses, no blocking. Meanwhile, Redis has no way to redistribute the load—it just starts choking when contention builds.
Also, Redis carries Lua scripting overhead in some cases. nubmq skips all that—pure Golang, no dependencies, no interpreter overhead
Even if so, that's besides the point IMO. I for one would prefer to use software that makes full use of the system's resources. Bonus points if this is a knob that we can turn.
First principle? Don't respect traditions—burn them.
No threads waiting on locks, no hard pauses for resizes, no conventional polling mechanisms. Instead, nubmq's entire architecture revolves around one relentless obsession: never let a client wait. Everything else—adaptive sharding, real-time migrations, soft expiry handling—is just engineering fallout from aggressively challenging every assumption Redis and Memcached made decades ago.
Would love to see which of these breaks with convention you're most curious (or skeptical) about!
Most 'built from scratch' projects still borrow heavily from established thinking—just reimplementing known patterns in different languages or flavors. But first principles means ripping out the assumptions entirely, starting with raw, fundamental constraints: CPU cycles, memory latency, concurrent I/O contention. nubmq doesn't assume that traditional sharding, resizing, or even Redis's single-threaded event loop is optimal just because they're popular. Instead, it assumes almost nothing: keys have to move fast, scaling must be invisible, and downtime is unacceptable. In other words: Don't just tweak existing approaches, burn them to the ground and rebuild from physics upwards. If your 'from scratch' doesn't actively challenge foundational assumptions, it's just another coat of paint—not a true redesign
To put it bluntly—most engineers today confuse complexity with sophistication. They're stacking libraries and frameworks without questioning if that complexity is needed. nubmq rejects that entirely. It's built on the rawest form of engineering, directly controlling memory, buffers, and goroutine allocation, skipping every third-party crutch that developers usually depend on. It might look radical, even uncomfortable, but that's exactly why it's performant. So, my question back is—are modern developers too afraid to actually build systems this way? Or have we simply gotten complacent with our convenient abstractions?
I vote for the hypothesis of them being complacent and wanting to score quick and easy wins.
In terms of therapeutic approaches I can't blame them but you are right that a lot of the current technology is too old and doesn't even make good use of all the good innovation that has been happening for now decades.
Case in point: so many programs are still single-threaded.
It's still early, but it's been stress-tested pretty aggressively with high write loads and contention scenarios. Right now, it's feature-complete as a high-performance in-memory KV store with dynamic scaling. Clustering and multi-node setups are on the roadmap before any production rollout, but for single-node workloads, it’s already showing strong results (115K+ ops/sec on an M2 MacBook Air).
The image here appears to be upside down (or rather rotated 180): https://github.com/nubskr/nubmq/blob/master/assets/architect...
It's not clear from the readme physically where the data is stored, nor where in the storage process the "congestion" is coming from.
I'm surprised there's no range scan. Range scans enable a whole swathe of functionality that make kv stores punch above their weight. I suppose that's more rocksdb/dynamo/bigtable/cassandra than redis/memcached, though.
> *"The image here appears to be upside down (or rather rotated 180)"*
Yeah, something weird happened with the image rendering; I'll fix that.
> *"It's not clear from the readme physically where the data is stored, nor where in the storage process the 'congestion' is coming from."*
The data is *fully in-memory*, distributed across dynamically growing shards (think of an adaptive hashtable that resizes itself). There’s no external storage layer like RocksDB or disk persistence—this is meant to be *pure cache-speed KV storage.*
Congestion happens when a shard starts getting too many keys relative to the rest of the system. The engine constantly tracks *contention per shard*, and when it crosses a threshold, we trigger an upgrade (new shards added, old ones redistributed). Migration is *zero-downtime*, but at very high write rates, there’s a brief moment where some writes are directed to the old store while the new one warms up.
> *"I'm surprised there's no range scan."*
Yeah, that’s an intentional design choice—this is meant to be a *high-speed cache*, closer to Redis than a full database like RocksDB or BigTable. Range queries would need an ordered structure (e.g., skip lists or B-trees), which add overhead. But I’m definitely considering implementing *prefix scans* (e.g., `SCAN user:*` style queries) since that’d be useful for a lot of real-world use cases.
The link no longer works for me. New link seems to be https://github.com/nubskr/nubmq/blob/master/assets/nubmq_new...
This looks great, how are you doing the benchmarks? It claims to be way faster than Redis. Can you also measure it against Microsoft Garnet? Whats the secret sauce for beating in latency?
Redis:
Nubmq: also, 700µs for Redis reads sounds high to me. Running against memtier_bench also would be great - https://github.com/RedisLabs/memtier_benchmarkThe benchmarks were done on an M2 MacBook Air (8-core) with 21M requests distributed across 100 concurrent clients sending requests as fast as possible(the test script is also in the github repo),
a few reasons for being fast that come to my mind now are:
1. reads are direct lookups, so distributing that across goroutines will result in them being faster 2. it is set requests where it gets complicated, if we're simply updating some key's value, it's essentially negligible, but if we're creating a new key value pair, that can increase per shard load under scale, which would trigger an store resizing, to avoid just stopping everything when that happens, the engine recognises when the per shard load starts to gets too high, in the backgroud creates a bigger store, and then essentially switches writes from the old engine to the new(bigger) one, while the old one keeps processing reads, and the older engine migrates it's keys to the newer one in backgroud, once it's done, we just dereference the older engine to be collected by GC :) this essentially makes sure that incoming requests keep getting served (oh shit, I spilled the secret sauce)
atleast on my machine,with the default setting, under that concurrent load, Redis starts slowing down due to single-threaded execution and Lua overhead.
Is Redis using all cores or you're comparing a multi threaded implementation in Go with a single threaded one in Redis?
Redis is single-threaded for dataset-modifying commands (SET, DEL, etc.), but multi-threaded for networking and I/O. nubmq, on the other hand, is fully multi-threaded—distributing reads and writes across all cores. So yes, this is fundamentally a comparison between Go’s concurrency model and Redis’s single-threaded event loop.
For benchmarking, I used the same test script (included in the repo) for both Redis and nubmq—100 concurrent clients hammering the server with requests. The core execution models remain unchanged; only the endpoints differ. So if you're asking whether the comparison reflects actual engine throughput rather than just connection handling—yes, it does.
The real ‘secret sauce’ is how nubmq prevents bottlenecks under load. When per-shard contention spikes, it spins up a larger store in the background, shifts new writes to it, and lazily migrates keys over. No pauses, no blocking. Meanwhile, Redis has no way to redistribute the load—it just starts choking when contention builds.
Also, Redis carries Lua scripting overhead in some cases. nubmq skips all that—pure Golang, no dependencies, no interpreter overhead
Even if so, that's besides the point IMO. I for one would prefer to use software that makes full use of the system's resources. Bonus points if this is a knob that we can turn.
What first principles were used to build this?
First principle? Don't respect traditions—burn them.
No threads waiting on locks, no hard pauses for resizes, no conventional polling mechanisms. Instead, nubmq's entire architecture revolves around one relentless obsession: never let a client wait. Everything else—adaptive sharding, real-time migrations, soft expiry handling—is just engineering fallout from aggressively challenging every assumption Redis and Memcached made decades ago.
Would love to see which of these breaks with convention you're most curious (or skeptical) about!
I am interested in the assumptions or new paradigms. How to distinguish “from first principles” from “intuitively from scratch”?
Most 'built from scratch' projects still borrow heavily from established thinking—just reimplementing known patterns in different languages or flavors. But first principles means ripping out the assumptions entirely, starting with raw, fundamental constraints: CPU cycles, memory latency, concurrent I/O contention. nubmq doesn't assume that traditional sharding, resizing, or even Redis's single-threaded event loop is optimal just because they're popular. Instead, it assumes almost nothing: keys have to move fast, scaling must be invisible, and downtime is unacceptable. In other words: Don't just tweak existing approaches, burn them to the ground and rebuild from physics upwards. If your 'from scratch' doesn't actively challenge foundational assumptions, it's just another coat of paint—not a true redesign
To put it bluntly—most engineers today confuse complexity with sophistication. They're stacking libraries and frameworks without questioning if that complexity is needed. nubmq rejects that entirely. It's built on the rawest form of engineering, directly controlling memory, buffers, and goroutine allocation, skipping every third-party crutch that developers usually depend on. It might look radical, even uncomfortable, but that's exactly why it's performant. So, my question back is—are modern developers too afraid to actually build systems this way? Or have we simply gotten complacent with our convenient abstractions?
I vote for the hypothesis of them being complacent and wanting to score quick and easy wins.
In terms of therapeutic approaches I can't blame them but you are right that a lot of the current technology is too old and doesn't even make good use of all the good innovation that has been happening for now decades.
Case in point: so many programs are still single-threaded.
How to contribute? Could you add contribute.md ?
have added a contributing md: https://github.com/nubskr/nubmq/blob/master/CONTRIBUTING.md
feel free to open any issues :)
are you using it in production yet?
It's still early, but it's been stress-tested pretty aggressively with high write loads and contention scenarios. Right now, it's feature-complete as a high-performance in-memory KV store with dynamic scaling. Clustering and multi-node setups are on the roadmap before any production rollout, but for single-node workloads, it’s already showing strong results (115K+ ops/sec on an M2 MacBook Air).
Are you thinking about a specific use case?