I highly recommend anyone to look up how PTP works and how it compares to NTP. Clock sync is very interesting. When I joined an HFT company, first thing I did was understand this stuff. We care about it a lot[1].
If you want a specific question to answer, answer this: why does PTP need hardware timestamping to achieve high precision (where the network card itself assigns timestamps to packets, rather than having the kernel do it as part of TCP/IP processing)? If we use software timestamps, why can we do microsecond precision at best? If you understand this, it goes a very long way to understanding the core ideas behind precise clock sync.
Once you have a solid understanding of PTP, look into White Rabbit. They’re able to sync two clocks with sub-ns precision. In case that isn’t obvious, that is absolutely insane.
[1] So do a lot of people. For example audio engineers. Once, an audio engineer absolutely talked my ear off about ptp. I had no idea that audio people understood clock sync so well but they do!
> So do a lot of people. For example audio engineers.
Indeed. PTP (various, not-necessarily compatible, versions) is at the core of modern ethernet-based audio networking: Dante (proprietary, PTP: IEEE 1588 v1), AVB (IEEE standard, PTP: 802.1AS), AES67 (AES standard, PTP: IEEE 1588 v2). And now the scope of the AVB protocol stack has been expanded to TSN for industrial and automotive time sensitive network applications.
I find time accuracy to be ridiculously interesting, and I have had to talk myself out of buying those a used atomic clock to play with [1]. I think precision time is very cool, and a small part of me wants to create the most overly engineered wall-clock using a Raspberry Pi or something to have sub-microsecond level accuracy.
Sadly, they're generally just a bit too expensive for me to justify it as a toy.
I don't work in trading (though not for lack of trying on my end), so most of the stuff I work on has been a lot more about "logical clocks", which are cool in their own right, but I have always wondered how much more efficient we could be if we had nanosecond-level precision to guarantee that locks are almost always uncontested.
[1] I'm not talking about those clocks that radio to Colorado or Greenwich, I mean the relatively small ones that you can buy that run locally.
> When two transactions happen at nearly the same time on different nodes, the database must determine which happened first. If clocks are out of sync, the database might order them incorrectly, violating consistency guarantees.
This is only true if you use wall clock time as part of your database’s consistency algorithm. Generally I think this is a huge mistake. It’s almost always much easier to swap to a logical clock - which doesn’t care about wall time. And then you don’t have to worry about ntp.
The basic idea is this: event A happened before event B iff A (or something that happened after A) was observed by the node that generated B before B was generated. As a result, you end up with a dag of events - kind of like git. Some events aren’t ordered relative to one another. (We say, they happened concurrently). If you ever need a global order for all events, you can deterministically pick an arbitrary order for concurrent events by comparing ids or something. And this will give you a total order that will be the same on all peers.
If you make database events work like this, time is a little more complex. (It’s a graph traversal rather than simple numbers). But as a result the system clock doesn’t matter. No need to worry about atomic clocks, skew, drift, monotonicity, and all of that junk. It massively simplifies your system design.
Also I still remember having fun with the "Determine the order of events by saving a tuple containing monotonic time and a strictly monotonically increasing integer as follows" part.
I wouldn't say it's a mistake. Distributed algorithms that depend on wall clock time generally give better guarantees. Usually you want these guarantees. The downside is of course you need to keep accurate time. In the cases you don't need them (eg. for the case you described), sure, but as an engineer you don't always get to choose your constraints.
Unfortunately, some of us have to deal with things like billing, transaction timing to validate what a client's logs might have on their systems, and so on.
My take on this is that second timing is close enough for this. And that all my internal systems need agree on the time. So if I'm off by 200ms or some blather from the rest of the world, I'm not overly concerned. I am concerned, however, if a random internal system is not synced to my own ntp servers.
This doesn't mean I don't keep our servers synced, just that being off by some manner of ms doesn't bother me inordinately. And when it comes to timing of events, yes, auto-increment IDs or some such are easier to deal with.
On the flipside, clock sync for civilians has never been easier. Thanks to NTP any device with an Internet connection can pretty easily get time accurate to 1 second, often as little as 10 ms. All major consumer computers are preconfigured to sync time to one of several reliable NTP pools.
This post is about more complicated synchronization for more demanding applications. And it's very good. I'm just marveling at how in my lifetime I from "no clock is ever set right" to assuming most anything was within a second of true time.
I was doing something at work that involved calculating round trip times from/to Android devices, and learned that although it should be possible for NTP to sync clocks with below-second precision, in practice many of the Android devices I was working with (mostly Pixels 2-7) were off from my server and each other by up to 5 seconds, which blew my mind.
Depending on carrier-specific configuration and firmware phones may be configured to prefer NITZ (time transmitted by the cellular network) instead of NTP. That time is probably what’s off and would explain your observation.
It's hard to keep a phone's clock closely synchronized because they experience a lot of temperature swings, going between pockets and hands and open air and sometimes in direct sun, and the processor goes between idle and 100% as well.
Once you get to internationa phones, you'll have places where the phone does not include all timezones and specifically is missing the actual local timezone, so automatic sync is typically disabled so that the time can be set so that the displayed time matches local time... even if that means the system time is not correct.
It’s not that hard. You would not expect 5 sec drift on phones that can sync time on the web at least once a day or once a week. A basic quartz crystal can keep time to within seconds per month of drift. High quality phones can do the same or better. Also the phone should keep track of system time as epoch time, and convert to local.
> Also the phone should keep track of system time as epoch time, and convert to local.
Yes, but imagine your local time is US Pacific time, but you have a phone intended to be sold in Mexico, so your phone only has Mexico time zones and MX Pacific Time has no DST. During part of the year, you can use automatic time sync, but during the summer, you disable automatic sync and set the clock so that the time displayed matches local time. Your epoch time is now an hour ahead of properly synched devices, but whatevs, your phone shows the right time and that's what counts.
Don't a lot of cellular networks rely on highly synchronized clocks to properly handle TDMA-style transmissions? Shouldn't they be very in sync with the towers' times?
I do believe there are (fairly) tight tolerances for clock synchronization between the network and the user equipment/handsets, but I don't know that it necessarily involves communicating the wallclock time. And the oscillators for signal timing aren't necessarily used for timekeeping.
I get that, but often the towers use GPS disciplined oscillators from what I understand (and have seen in limited circumstances), they inherently know exactly what time it is. Seems trivial to sync that as well, just kind of assumed they did that.
I don't think civilian clock synchronization was an issue since a long time ago.
DCF77 and WWVB has been around for more than 50 years. You could use some cheap electronics and get well below millisecond accuracy. GPS has been fully operational for 30 years, but it needs more expensive device.
I suspect you could even get below 1 sec accuracy using a watch with a hacking movement and listening to radio broadcast of time beeps / pips.
Both of the WWVB clocks I've owned have been very fickle about how they're placed because RF be that way sometimes, and Colorado isn't exactly nearby to my location in Ohio.
The first manufactured GPS clock I owned (as in: switch it on and time is shown on a dedicated display) was in a 2007 Honda.
And even after it began displaying the right time again, it had the wrong date. It was offset by years and years, which was OK-ish, but also by several months.
Having the date offset by months caused the HVAC to behave in strange incurable ways because it expected the sun to be in positions where it was not.
But NTP? NTP has never been fickle for me, even in the intermittently-connected dialup days I experienced ~30 years ago: If I can get to the network occasionally, then I can connect to a few NTP servers and keep a local clock reasonably-accurate.
The WWVB clocks are around the AM band, which means they carry a great distance despite their lower transmission power, but only at nighttime. Ohio is nothing; the signal needs to make it to the southern reaches of Florida.
On the one hand, some sloppy GPS units fail on a 20 year schedule. On the other hand, a bunch of things using NTP are going to fail in about ten years. (2036 rather than 2038 because reasons)
If I ever get the chance, I'll try to remember to tell the 1995 version of me to watch out for that pesky overflow bug that they might experience with NTP -- two score and 1 year in their future.
At this point the only clock in my life that doesn't auto set is the one on my stove, and that's because I abhor internet connected kitchen appliances.
In the 80s my uncle had digital clocks that used an antenna to tune into the atomic clock time signal that (was/is?) broadcast nationwide. I've long wished that it was incorporated into stoves, microwaves, essentially everything that isn't an internet device (yet... sigh)
Sadly I think the actual antenna and hardware were relatively large since it's a long wave signal, but maybe with SDR it'll all fit on the head of a pin these days.
I believe it was a longwave broadcast so probably WWVB which would apparently imply a 60mm antenna, but it was a standard old school "GE digital clock radio" form factor so size wasn't at a premium.
> Sadly I think the actual antenna and hardware were relatively large since it's a long wave signal, but maybe with SDR it'll all fit on the head of a pin these days.
Unfortunately there's no real way to cheat physics as far as shrinking a wavelength goes. With RF antennas about the best you can do is a major dimension 1/10th the frequency of interest.
There are many DCF77 receivers in Germany that are contained in a square box that's barely large enough for a AA battery; the rest of the square contains the motor/gears and the electronics/receiver (incl. a ferrite loopstick antenna).
Yeah, that's because it's receiving an extremely narrowband signal accumulated over a long window so it can suffer the trash efficiency I'm talking about.
Back when I was studying computer science, I was taking the OS exam and the part about Lamport timestamp [0] was optional, but I had studied it because I loved it. When I mentioned it to my professor, he was so happy to hear something new that day that he asked me to describe it in details. This was the year 2001.
Many years later, in 2020, I ended up living in San Francisco, and I had the fortune to meet Leslie Lamport after I sent him a cold email. Lovely and smart guy. This is the text of the first part of that email, just for your curiosity:
Hey Leslie!
You have accompanied me for more than 20 years. I first met your name when studying Lamport timestamps.
And then on, and on, and on, up to a few minutes ago, when I realized that you are also behind the paper and the title of "Byzantine Generals problem", renamed after the "Albanian" generals to the suggestion of Jack Goldberg. Who is he? [1]
Ok,so people use NTP to "synchronize" their clocks and then write applications that assume the clocks are in exact sync and can use timestamps for synchronization, even though NTP can see the clocks aren't always in sync. Do I have that right?
If you are an engineer at Google dealing with Spanner, then you can in fact assume clocks are well synchronized and can use timestamps for synchronization. If you get commit timestamps from Spanner you can compare them to determine exactly which commit happened first. That’s a stronger guarantee than the typical Serializable database like postgresql: https://www.postgresql.org/docs/current/transaction-iso.html...
That’s the radical developer simplicity promised by TrueTime mentioned in the article.
That’s actually not at all what TrueTime guarantees and assuming they’ve solved a physical impossibility is dangerous technically as a founding assumption for higher level tech (which thankfully Spanner does not do).
What TrueTime says is that clocks are synchronized within some delta just like NTP, but that delta is significantly smaller thanks to GPS time sync. That enables applications to have tighter bounds on waiting to see if a conflict may exist before committing which is why Spanner is fast. CockroachDB works similarly but given the logistical challenge of getting GPS receivers into data centers, they worked to achieve a smaller delta through better NTP-like timestamps and generally get fairly close performance.
> Bounded Uncertainty: TrueTime provides a time interval, [earliest, latest], rather than a single timestamp. This interval represents the possible range of the current time with bounded uncertainty. The uncertainty is caused by clock drift, synchronization delays, and other factors in distributed systems.
That’s exactly what I’m saying but you simply provided more details. TrueTime guarantees clocks are well synchronized: and of course that means synchronized to a reasonable upper bound. It’s no more possible for clocks to be absolutely synchronized, than for two line segments drawn independently to have absolutely the same length.
> you can compare them to determine exactly which commit happened first
This is the part I was referring to. You cannot just compare timestamps and know which happened first. You have to actually handle the case where you don’t know if there’s a happens before relationship between the timestamps. Thats a very important distinction
> External consistency states that Spanner executes transactions in a manner that is indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. Because the timestamps generated for transactions correspond to the serial order, if any client sees a transaction T2 start to commit after another transaction T1 finishes, the system will assign a timestamp to T2 that is higher than T1's timestamp.
Of course there is always the edge case where two commits have the same commit timestamp. Therefore from the perspective of Spanner, they happen simultaneously and there is no way to determine which happens first. But there is no need to. There is no causality relationship between them. If you insist, you can arbitrarily assign a happens-before relationship in your own code and nothing will break.
Alternatively, you could guarantee the same synchronization using PPS and PTP to each host's DCD pin of their serial port or to specialized hardware such as modern PTP-enabled smart NICs/FPGAs that can accept PPS input. GPS+PPS gets you to within 20-80ns global synchronization depending on implementation (assuming you're all mostly in the same inertial frame), and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).
Of course, you can do this in good conditions. The extremely powerful part that TrueTime brings is how the system degrades when something goes wrong.
If everyone is synced to +/- 20ns, that's great. Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.
The other benefit of building in this uncertainty to the underlying software design is you don't have to have your entire infrastructure on the same hardware stack. If you have one datacenter that's 20yrs old, has no GPS infrastructure, and operates purely on NTP - this can still run the same software, just much more slowly. You might even keep some of this around for testing - and now you have ongoing data showing what will happen to your distributed system if GPS were to go away in a chunk of the world for some sustained period of time.
And in a brighter future, if we're able to synchronize everyone's clocks to +/- 1ns, the intervals just get smaller and we see improved performance without having to rethink the entire design.
> Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.
Most NTP/PTP appliances have internal clocks that are OCXO or rubidium that have holdover (even for several days).
If time is that important to you then you'll have them, plus perhaps some fibre connections to other sites that are hopefully out of range of the jamming.
> fibre connections to other sites that are hopefully out of range of the jamming.
I guess it's not inconceivable that eventually there's a global clock network using a White-Rabbit-like protocol over dedicated fibre. But if you have to worry about GPS jamming you probably have to worry about undersea cable cutting too.
> and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).
TrueTime is the software algorithm for managing the timestamps. It’s agnostic to the accuracy of the underlying time source. If it was inaccurate then you get looser bounds and as you note higher latency. Google already does everything you suggest for TrueTime while also having atomic clocks in places.
Yup! I was referring to the original TrueTime/Spanner papers, not whatever's currently deployed. The original paper makes reference to distributed ordering guarantees at the milliseconds' scale precision, which implies many more transactions in flight in the uncertain state and coarser distributed ordering guarantees than the much tighter upper bound you can set with nanoseconds' precision and microseconds' comms latency...
More that they use GPS to synchronize the clocks. Having your own atomic clock doesn’t really improve your accuracy except for within the single data center you have it deployed (although I’m sure there’s techniques for synchronizing with low bounds against nearby atomic clocks + GPS to get really tight bound so they don’t need one in every data center)
Depending on the application you would generally use PTP to get sub-microsecond accuracy. The real trick is that architecture should tolerate various clocks starting or jumping out of sync and self correct.
Unfortunate that the author doesn’t bring up FoundationDB version stamps, which to me feel like the right solution to the problem. Essentially, you can write a value you can’t read until after the transaction is committed and the synchronization infrastructure guarantees that value ends up being monotonically increasing per transaction. They use similar “write only” operations for atomic operations like increment.
The key here is a singleton sequencer component that stamps the new versions. There was a great article shared here on similar techniques used in trading order books (https://news.ycombinator.com/item?id=46192181).
Agree this is the best solution, I’d rather have a tiny failover period than risk serialization issues. Working with FDB has been such a joy because it’s serializable it takes away an entire class of error to consider, leading to simpler implementation.
Yes. A consistent total ordering is what you need (want) in distributed computing. Ultimately, causality is what is important, but consistent ordering of concurrent operations makes things much easier to work with.
Consistent ordering of concurrent operations is easy though. Just detect this case (via logical clocks) then order using node ids or transaction ids if the logical clocks show the transactions as being concurrent. Am I missing something? This feels like a very solved problem. (I’ve worked on CRDTs where we have the same problem. There exist incredibly fast algorithms for this.)
I don’t think so, I think it is solved in the general sense. However what Spanner does is unique, and it does use synchronised clocks in order to do it.
However, Spanner does not solve the inter-continental acid database with high write throughput. So I don’t see it as ground breaking. CRDT’s are interesting, I’ve followed your work for a long time, but too constrained to solve this general problem I think.
Yes, though the API of having a write-only value that is a monotonically increasing counter is much simpler than having to think about causality or logical clocks.
For an article written about time, I would have thought there'd be a timestamp on the blog post. Just something to think about if someone stumbles upon this in a few years.
The article doesn't cover the inane stupid that is:
* NTP pool server usage requires using DNS
* people have DNSSEC setup, which requires accurate time or it fails
So if your clock is off, you cannot lookup NTP pool servers via DNS, and therefore cannot set your clock.
This sheer stupid has been discussed with package maintainers of major distros, with ntpsec, and the result is a mere shrug. Often, the answer is "but doesn't your device have a battery backed clock?", which is quite unhelpful. Many devices (routers, IOT devices, small boards, or older machines, etc) don't have a battery backed clock, or alternatively the battery may just have died.
Beyond that, the ntpsec codebase has a horrible bug where if DNS is not available when ntpsec starts, pool server addresses are never, ever retried. So if you have a complete power-fail in a datacentre rack, and your firewalls take a little longer to boot than your machines, you'll have to manually restart ntpsec to even get it to ever sync.
When discussing this bug the ntpsec lads were confused that DNS might not exist at times.
Long story short, make sure you aren't using DNS in any capacity, in NTP configs, and most especially in ntpsec configs.
One good source is just using the IPs provided by NIST. Pool servers may seem fine, but I'd trust IPs assigned to NIST to exist longer than any DNS anyhow. EG, for decades.
AWS has the Google TrueTime equivalent precision clock available for public use[1] which makes this problem much easier to solve now. Auora DSQL uses it. Even third party db's like YugabyteDb make use of it.
As a user of WhiteRabbit, I can confirm a sub-10ps sync (two clocks phase lock) over 50km fiber connection for variable temperature of fiber (biggest problem of clock sync over fibers is temperature induced length change of the fiber itself, which needs to be measured and compensated).
The standards-compliant endpoints do all of the work. They count clock cycles for ping pong messages and share with each other the length of time so time-of-flight is tracked and compensated for.
I wouldn't say it's a 'nightmare'. It's just more complicated than what regular folk think computers work when it comes to time sync. There's nothing nightmareish or scary about this, it's just using the best solution for your scenario, understanding limitations and adjusting expectations/requirements accordingly, perhaps relaxing consistency requirements.
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
Maybe. But I remember one game developer told that they face even a more challenging problem, which is the synchronization between players in multiplayer real-time games. Just imagine different users having significantly different network latencies in a multiplayer shooter where a couple milliseconds can be decisive. Someone makes a headshot when the game state is already outdated. If you think about this you can appreciate how it's complicated just to make the gameplay at least not awful...
I took to distributed systems like a duck to water. It was only much later that I figured out that while there are things I can figure out in one minute that took other people five, there were a lot of others that you will have to walk them through step by step or they would never get there. That really explained some interactions I’d had when I was younger.
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
> I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics
Yeah. I was a physics major and it really helped to have had my naive assumptions about time and clocks completely demolished early on by taking classes in special and general relatively. When I eventually found my way into tech a lot of distributed systems concepts that are difficult to other people (clock sync, indeterminate ordering of events, consensus) came quite naturally because of all that early training.
I think it's no accident that distributed systems theory guru Leslie Lamport had written an unpublished book on General Relativity before he wrote the famous Time, Clocks and the Ordering of Events in a Distributed System paper and the Paxos paper. In the former in particular the analogy to special relatively is quite plain to see.
Sometimes hardware that has PTP support in the specs doesn't perform very well though, so if you do things at scale, being able to validate things like switches and network card drivers is useful too!
It's to the point timing server vendors I've spoken to have their own test labs where they have to validate network gear and then publish lists of recommended and tested configurations.
Even some older cards where you'd think the PTP issues would be solved still have weird driver quirks in Linux!
Reminds me of the old saying: 'If you have just one watch/clock, then you always know what time it is; but if you have two of them, then you are never sure!'
Clock sync is such a nightmare in robotics. Most OSes happily will skew/jump to get the time correct. Time jumps (especially backwards) will crash most robotics stacks. You might decide to ensure that you have synced time before starting the stack. Great, now your timestamps are mostly accurate, except what happens when you've used GPS as your time source, and you start indoors? Robot hangs forever.
Hot take: I've seen this and enough other badly configured time sync settings that I want to ban system time from robotics systems - time from startup only! If you want to know what the real world time was for a piece of data after, write what your epoch is once you have a time sync, and add epoch+start time.
If your requirements are “must have accurate time, must start with an inaccurate time, must not step time during operation, no atomic clocks, must not require a network connection, or a WWVB signal, must work without a GPS signal” then yes, you need to relax your requirements.
But it doesn’t have to be the first requirement you relax.
If it has a GPS already, it’s really easy to fall into the trap of just using it, but point taken. Then main requirement is accurate moment to moment time. Using GPS as the master clock mostly makes sense there.
Normally I would nod at the title. Having lived it.
But I just watched/listened to a Richard Feynmann talk on the nature of time and clocks and the futility of "synchronizing" clocks. So I'm chuckling a bit. In the general sense, I mean. Yes yes, for practical purposes in the same reference frame on earth, it's difficult but there's hope. Now, in general ... synchronizing two clocks is ... meaningless?
Feynman was not entirely sincere. The implosion of nuclear device requires precise synchronization of multiple detonations. Basically the more precisely you can trigger the less fissile material you need for the sphere. To the day high accuracy bridgewire/foil bridge designs remain on ITAR.
Wild. My layperson mind goes to a simple example, which may or may not be possible, but please tell me if this is the gist:
Alice and Bob, in different reference frames, both witness events C and D occurring. Alice says C happened before D. Bob says D happened before C. They're both correct. (And good luck synchronizing your watches, Alice and Bob!)
Yes that definitely happens. People orbiting Polaris would be seeing two supernovas explode at different times than us due to the speed of light. Polaris is 400 light years away so the gap could be large.
But when you are moving you may see very closely spaced events in different order, because you’re moving toward Carol but at an angle to Doug. Versus someone else moving toward Doug at an angle to Carol.
There is distinction between seeing when events happened, and when they really happened. The latter can be reconstructed by an observer.
In special relativity, time is relative and when things actually happened can be different in different frames. Casually linked events are always really in the same order. But disconnected events can be seen in different orders depending on speed of observer.
> But disconnected events can be seen in different orders depending on speed of observer.
What are "disconnected events"? In a subtle but still real sense, are not all events causally linked? e.g. gravitationally, magnetically, subatomically or quantumly?
I can understand that our simple minds and computational abilities lead us to consider events "far away" from each other as "disconnected" for practical reasons. But are they really not causally connected in a subtle way?
There are pieces of space time that are clearly, obviously causally connected to each other. And there are far away regions of the universe that are, practically speaking, causally disconnected from things "around here". But wouldn't these causally disjoint regions overlap with each other, stringing together a chain of causality from anywhere to anywhere?
Or is there a complete vacuum of insulation between some truly disconnected events that don't overlap with any other observational light cone or frame of reference at all?
We now know that gravity moves at the speed of light. Imagie that you aretwo supernovas that for some unknown reason, explode at essentially the same time. Just before you die from radiation exposure, you will see the light pulse from each supernova before each supernova can 'see' the gravitational disruption caused by the other. Maybe a gravity wave can push a chain reaction on the verge of happening into either a) happening or b) being delayed for a brief time, but the second explosion happened before the pulse from the first could have arrived. So you're pretty sure they aren't causally linked.
However if they were both triggered by a binary black hole merger, then they're dependent events but not on each other.
But I think the general discussion is more of a 'Han shot first' sort. One intelligent system reacting to an action of another intelligent system, and not being able to discern as a person from a different reference frame as to who started it and who reacted. So I suppose when we have relativistic duels we will have to preserve the role of the 'second' as a witness to the events. Or we will have to just shrug and find something else to worry about.
Causality moves at the speed of light. Events that are farther apart are called spacelike and aren't causally connected.
I think you might be confusing events that have some history between them, and those are influence each other. Like say right now, Martian rover sends message to Earth and Earth sends message to them, those aren't causally connected cause don't know about the other message until light speed delay has passed.
> But wouldn't these causally disjoint regions overlap with each other
Yes.
> stringing together a chain of causality from anywhere to anywhere?
No? Causality reaching one edge of a sphere doesn't mean it instantaneously teleports to every point in that same sphere. This isn't a transitive relationship.
> What are "disconnected events"?
The sentence you're responding to seems like a decent definition. Disconnected events are events which might be observed in either order depending on the position of an observer.
Nature (laws of physics) is agains you on this: it is in fact impossible for everyone. What is in sync for some observers can be out of sync for others (depends on where they are, i.e. gravity, and how they relatively move). See general and special relativity principle of simultaneity [1].
I think you just nerd-sniped me but I’m not convinced it’s impossible to assign a consistent ordering to events with relativistic separations.
For starters, the spacetime interval between two events IS a Lorentz invariant quantity. That could probably be used to establish a universal order for timelike separations between events. I suspect that you could use a reference clock, like a pulsar or something to act as an event against which to measure the spacetime interval to other events, and use that for ordering. Any events separated by a light-like interval are essentially simultaneous to all observers under that measure.
The problem comes for events with a space like or light like separation. In that case, the spacetime interval is still conserved, but I’m not sure how you assign order to them. Perhaps the same system works without modification, but I’m not sure.
For any space-like event you can find reference frames where things happen in different order. For the time-like situation you described the order indeed exists within the cone, which is to say that causality exists.
In physics, time is local and relative, independent events don’t need a global ordering. Distributed databases shouldn’t require one either. The idea of a single global time comes from 1980s single-node database semantics, where serializability implied one universal execution order. When that model was lifted into distributed systems, researchers introduced global clocks and timestamp coordination to preserve those guarantees, not because distributed systems fundamentally need it. It’s time we rethink this., Only operations that touch the same piece of data require ordering. Everything else should follow causality like the physical universe, independent events don’t need to agree on sequence, only dependent ones do. Global clocks exist because some databases forced serializable cross-object transactions onto distributed systems, not because nature requires it.
Edit: I welcome for a discussion with people who disagree and downvote.
You can’t be certain that any given mutating operation you perform now won’t be relied upon for some future operation, unless the two operations are performed in entirely different domains of data. Even “not touching (by which I assume you mean mutating) the same data” isn’t enough. If I update A in thread 0 from 1 to 2, then I update B in thread 1 to the value of A+1, then the value of B could end up being 2 or 3, depending on whether the update of A reached thread 1.
In distributed systems, dependencies flow forward, not backward. Causal dependency only exists when an operation actually references earlier state.
If B = A+1, then yes, B is causally dependent on A and they must share an order.
But that dependency is created by the application logic, not assumed globally in advance.
We shouldn’t impose a universal timeline just because some future operation might depend on some past one. Dependencies should be explicit and local: if two operations interact, they share a causal scope; if they don’t, they shouldn’t pay the cost of coordination.
Timesync isn’t a nightmare at all. But it is a deep rabbit hole.
The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!
My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.
I kinda don’t like PTP. Too complicated and requires specialized hardware.
This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.
You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
At this stage, it's difficult to find an half-decent ethernet quality MAC that doesn't have PTP timestamping. It's not a particularly complicated protocol, either.
I needed to distribute PPS and 10MHz into a GNSS-denied environment, so last summer I designed a board to do this using 802.1AS gPTP with a uBlox LEA-M8T GNSS timing receiver, a 10MHz OCXO and an STM32F767 MCU. This took me about four weeks. Software is written in C, and the PTP implementation accounts for 1500 LOC.
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.
> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.
This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.
> TicSync is cool
I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.
> This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock.
Well, it requires having the conversion function for each edge in the traversed path. And such function needs to exist only at the location(s) performing the conversion.
> obviously it adds complexity to end-stations beyond what PTP requires
If you have PTP and it works then stick with it. If you’re trying to timesync a network of wearable devices then you don’t have PTP stamping hardware.
> because packets traversing different routes
Fair callout. It’s probably a more useful model for less internty use cases. Of which there are many!
For example when trying to timesync a collection of different sensors on different devices/microcontrollers.
Roboticists like CanBus and Ethercat. But even that is kinda overkill imho. TicSync can get you tens of microseconds of precision in user space.
PTP requires support not only on your network, but also on your peripheral bus and inside your CPU. It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states.
You can if you just run PTP (almost) entirely on your NIC. The best PTP implementations take their packet timestamps at the MAC on the NIC and keep time based on that. Nothing about CPU processing is time-critical in that case.
Well, if the goal is for software running on the host CPU to know the time accurately, then it does matter. The control loop for host PTP benefits from regularity. Anyway NICs that support PTP hardware timestamping may also use PCI LTR (latency tolerance reporting) to instruct the host operating system to disable high-exit-latency sleep features, and popular operating systems respect that.
> The control loop for host PTP benefits from regularity.
How much regularity? If you sent PTP packets with 5 milliseconds of randomness in the scheduling, does that cause real problems? It's still going to have an accurate timestamp.
> instruct the host operating system to disable high-exit-latency sleep features
Why, though? You didn't explain this. As long as the packet got timestamped when it arrived, the CPU can ask the NIC how many nanoseconds ago that was, and correct for how long it was asleep. Right?
I see nothing in your pair of unnecessarily belligerent comments that actually contradicts what I said. There are host-side features that enable the clock discipline you are observing, even if you are apparently not aware of them.
This is a really helpful contribution - if only everyone could be as smart as you.
If mine are somehow too beligerent for you, which is hilarious given how arrogant and beligerent your initial comment and responses come off as (maybe you are not aware?), then perhaps you'd like to actually engage any of the other comments that point out how wrong you are in a meaningful way?
Or are those too beligerent as well?
Because you didn't respond to any of those, either.
PTP does not require support on your network beyond standard ethernet packet forwarding when used in ethernet mode.
In multicast IP mode, with multiple switches, it requires what anything running multicast between switches/etc would require (IE some form of IGMP snopping or multicast routing or .....)
In unicast IP mode, it requires nothing from your network.
Therefore, i have no idea what it means to "require support on the network".
I have used both ethernet and multicast PTP across a complete mishmash of brands and types and medias of switches, computers, etc, with no issues.
The only thing that "support" might improve is more accurate path delay data through transparent clocks. If both master and slave do accurate hardware timestamping already, and the path between them is constant, it is easily possible to get +-50 nanoseconds without any transparent clock support.
Here is the stats from a random embedded device running PTP i just accessed a second ago:
Reference ID : 50545030 (PTP0)
Stratum : 1
Ref time (UTC) : Sun Dec 28 02:47:25 2025
System time : 0.000000029 seconds slow of NTP time
Last offset : -0.000000042 seconds
RMS offset : 0.000000034 seconds
Frequency : 8.110 ppm slow
Residual freq : -0.000 ppm
Skew : 0.003 ppm
So this embedded ARM device, which is not special in any way, is maintaining time +-35ns of the grandmaster, and currently 30ns of GPS time.
The card does not have an embedded hardware PTP clock, but it does do hardware timestamp and filtering.
This grandmaster is an RPI with an intel chipset on it and the PPS input pin being used to discipline the chipset's clock. It stays within +-2ns (usually +-1ns) of GPS time.
Obviously, holdover sucks, but not the point :)
This qualifies as better-than-NTP for sure, and this setup has no network support. No transparent clocks, etc. These machines have multiple media transitions involved (fiber->ethernet), etc.
The main thing transparent clock support provides in practice is dealing with highly variable delay. Either from mode of transport, number of packet processors in between your nodes, etc. Something that causes the delay to be hard to account for.
The ethernet packet processing in ethernet mode is being handled in hardware by the switches and basically all network cards. IP variants would probably be hardware assisted but not fully offloaded on all cards, and just ignored on switches (assuming they are not really routers in disguise).
The hardware timestamping is being done in the card (and the vast majority of ethernet cards have supported PTP harware timestamping for >1 decade at this point), and works perfectly fine with deep CPU sleep states.
Some don't do hardware filtering so they essentially are processing more packets that necessary but .....
> Google faced the clock synchronization problem at an unprecedented scale with Spanner, its globally distributed database. They needed strong consistency guarantees across data centers spanning continents, which requires knowing the order of transactions.
> Here’s a video of me explaining this.
Do you need a video? Do we need a 42 minute video to explain this?
I generally agree with Feynman on this stuff. We let explanations be far more complex than they need to be for most things, and it makes the hunt for accidental complexity harder because everything looks almost as complex as the problems that need more study to divine what is actually going on there.
For Spanner to be useful they needed a high transaction rate and in a distributed system that requires very tight grace periods for First Writer Wins. Tighter than you can achieve with NTP or system clocks. That’s it. That’s why they invented a new clock.
Google puts it this way:
Under external consistency, the system behaves as if all transactions run sequentially, even though Spanner actually runs them across multiple servers (and possibly in multiple datacenters) for higher performance and availability.
But that’s a bit thick for people who don’t spend weeks or years thinking about distributed systems.
I highly recommend anyone to look up how PTP works and how it compares to NTP. Clock sync is very interesting. When I joined an HFT company, first thing I did was understand this stuff. We care about it a lot[1].
If you want a specific question to answer, answer this: why does PTP need hardware timestamping to achieve high precision (where the network card itself assigns timestamps to packets, rather than having the kernel do it as part of TCP/IP processing)? If we use software timestamps, why can we do microsecond precision at best? If you understand this, it goes a very long way to understanding the core ideas behind precise clock sync.
Once you have a solid understanding of PTP, look into White Rabbit. They’re able to sync two clocks with sub-ns precision. In case that isn’t obvious, that is absolutely insane.
[1] So do a lot of people. For example audio engineers. Once, an audio engineer absolutely talked my ear off about ptp. I had no idea that audio people understood clock sync so well but they do!
> So do a lot of people. For example audio engineers.
Indeed. PTP (various, not-necessarily compatible, versions) is at the core of modern ethernet-based audio networking: Dante (proprietary, PTP: IEEE 1588 v1), AVB (IEEE standard, PTP: 802.1AS), AES67 (AES standard, PTP: IEEE 1588 v2). And now the scope of the AVB protocol stack has been expanded to TSN for industrial and automotive time sensitive network applications.
Yeah, the audio engineer then talked my ear off about networking!
Not just audio, anybody in the live events / production space needs all equipment marching in lock step.
If it's for an event, can they not bring all the devices together in close proximity and sync them somehow? That at least removes network delays
You can't sync individual oscillators precisely for very long.
I find time accuracy to be ridiculously interesting, and I have had to talk myself out of buying those a used atomic clock to play with [1]. I think precision time is very cool, and a small part of me wants to create the most overly engineered wall-clock using a Raspberry Pi or something to have sub-microsecond level accuracy.
Sadly, they're generally just a bit too expensive for me to justify it as a toy.
I don't work in trading (though not for lack of trying on my end), so most of the stuff I work on has been a lot more about "logical clocks", which are cool in their own right, but I have always wondered how much more efficient we could be if we had nanosecond-level precision to guarantee that locks are almost always uncontested.
[1] I'm not talking about those clocks that radio to Colorado or Greenwich, I mean the relatively small ones that you can buy that run locally.
> When two transactions happen at nearly the same time on different nodes, the database must determine which happened first. If clocks are out of sync, the database might order them incorrectly, violating consistency guarantees.
This is only true if you use wall clock time as part of your database’s consistency algorithm. Generally I think this is a huge mistake. It’s almost always much easier to swap to a logical clock - which doesn’t care about wall time. And then you don’t have to worry about ntp.
The basic idea is this: event A happened before event B iff A (or something that happened after A) was observed by the node that generated B before B was generated. As a result, you end up with a dag of events - kind of like git. Some events aren’t ordered relative to one another. (We say, they happened concurrently). If you ever need a global order for all events, you can deterministically pick an arbitrary order for concurrent events by comparing ids or something. And this will give you a total order that will be the same on all peers.
If you make database events work like this, time is a little more complex. (It’s a graph traversal rather than simple numbers). But as a result the system clock doesn’t matter. No need to worry about atomic clocks, skew, drift, monotonicity, and all of that junk. It massively simplifies your system design.
Related in many ways: https://www.erlang.org/docs/22/apps/erts/time_correction
Also I still remember having fun with the "Determine the order of events by saving a tuple containing monotonic time and a strictly monotonically increasing integer as follows" part.
I wouldn't say it's a mistake. Distributed algorithms that depend on wall clock time generally give better guarantees. Usually you want these guarantees. The downside is of course you need to keep accurate time. In the cases you don't need them (eg. for the case you described), sure, but as an engineer you don't always get to choose your constraints.
Unfortunately, some of us have to deal with things like billing, transaction timing to validate what a client's logs might have on their systems, and so on.
My take on this is that second timing is close enough for this. And that all my internal systems need agree on the time. So if I'm off by 200ms or some blather from the rest of the world, I'm not overly concerned. I am concerned, however, if a random internal system is not synced to my own ntp servers.
This doesn't mean I don't keep our servers synced, just that being off by some manner of ms doesn't bother me inordinately. And when it comes to timing of events, yes, auto-increment IDs or some such are easier to deal with.
On the flipside, clock sync for civilians has never been easier. Thanks to NTP any device with an Internet connection can pretty easily get time accurate to 1 second, often as little as 10 ms. All major consumer computers are preconfigured to sync time to one of several reliable NTP pools.
This post is about more complicated synchronization for more demanding applications. And it's very good. I'm just marveling at how in my lifetime I from "no clock is ever set right" to assuming most anything was within a second of true time.
I was doing something at work that involved calculating round trip times from/to Android devices, and learned that although it should be possible for NTP to sync clocks with below-second precision, in practice many of the Android devices I was working with (mostly Pixels 2-7) were off from my server and each other by up to 5 seconds, which blew my mind.
Depending on carrier-specific configuration and firmware phones may be configured to prefer NITZ (time transmitted by the cellular network) instead of NTP. That time is probably what’s off and would explain your observation.
It's hard to keep a phone's clock closely synchronized because they experience a lot of temperature swings, going between pockets and hands and open air and sometimes in direct sun, and the processor goes between idle and 100% as well.
Once you get to internationa phones, you'll have places where the phone does not include all timezones and specifically is missing the actual local timezone, so automatic sync is typically disabled so that the time can be set so that the displayed time matches local time... even if that means the system time is not correct.
It’s not that hard. You would not expect 5 sec drift on phones that can sync time on the web at least once a day or once a week. A basic quartz crystal can keep time to within seconds per month of drift. High quality phones can do the same or better. Also the phone should keep track of system time as epoch time, and convert to local.
> Also the phone should keep track of system time as epoch time, and convert to local.
Yes, but imagine your local time is US Pacific time, but you have a phone intended to be sold in Mexico, so your phone only has Mexico time zones and MX Pacific Time has no DST. During part of the year, you can use automatic time sync, but during the summer, you disable automatic sync and set the clock so that the time displayed matches local time. Your epoch time is now an hour ahead of properly synched devices, but whatevs, your phone shows the right time and that's what counts.
Don't a lot of cellular networks rely on highly synchronized clocks to properly handle TDMA-style transmissions? Shouldn't they be very in sync with the towers' times?
I do believe there are (fairly) tight tolerances for clock synchronization between the network and the user equipment/handsets, but I don't know that it necessarily involves communicating the wallclock time. And the oscillators for signal timing aren't necessarily used for timekeeping.
I get that, but often the towers use GPS disciplined oscillators from what I understand (and have seen in limited circumstances), they inherently know exactly what time it is. Seems trivial to sync that as well, just kind of assumed they did that.
Yeah it can only do so much with varying latency
> clock sync for civilians has never been easier
I don't think civilian clock synchronization was an issue since a long time ago.
DCF77 and WWVB has been around for more than 50 years. You could use some cheap electronics and get well below millisecond accuracy. GPS has been fully operational for 30 years, but it needs more expensive device.
I suspect you could even get below 1 sec accuracy using a watch with a hacking movement and listening to radio broadcast of time beeps / pips.
Both of the WWVB clocks I've owned have been very fickle about how they're placed because RF be that way sometimes, and Colorado isn't exactly nearby to my location in Ohio.
The first manufactured GPS clock I owned (as in: switch it on and time is shown on a dedicated display) was in a 2007 Honda.
But a firmware bug ruined that clock: https://didhondafixtheclocks.com/
And even after it began displaying the right time again, it had the wrong date. It was offset by years and years, which was OK-ish, but also by several months.
Having the date offset by months caused the HVAC to behave in strange incurable ways because it expected the sun to be in positions where it was not.
But NTP? NTP has never been fickle for me, even in the intermittently-connected dialup days I experienced ~30 years ago: If I can get to the network occasionally, then I can connect to a few NTP servers and keep a local clock reasonably-accurate.
NTP has been resolutely awesome for me.
The WWVB clocks are around the AM band, which means they carry a great distance despite their lower transmission power, but only at nighttime. Ohio is nothing; the signal needs to make it to the southern reaches of Florida.
On the one hand, some sloppy GPS units fail on a 20 year schedule. On the other hand, a bunch of things using NTP are going to fail in about ten years. (2036 rather than 2038 because reasons)
Yeah, good point.
If I ever get the chance, I'll try to remember to tell the 1995 version of me to watch out for that pesky overflow bug that they might experience with NTP -- two score and 1 year in their future.
At this point the only clock in my life that doesn't auto set is the one on my stove, and that's because I abhor internet connected kitchen appliances.
Good ol' Oven Standard Time (OST).
Same here. I wish there was an easy way around it (that doesn't require me to play sysadmin in my spare time).
In the 80s my uncle had digital clocks that used an antenna to tune into the atomic clock time signal that (was/is?) broadcast nationwide. I've long wished that it was incorporated into stoves, microwaves, essentially everything that isn't an internet device (yet... sigh)
Sadly I think the actual antenna and hardware were relatively large since it's a long wave signal, but maybe with SDR it'll all fit on the head of a pin these days.
> atomic clock time signal that (was/is?) broadcast nationwide
Probably DCF77 or WWVB.
> I think the actual antenna and hardware were relatively large since it's a long wave signal
Casio has some normal sized wristwatches that synchronizes to DCF77, it would definitely fit into a stove, microwave, or basically anything.
I believe it was a longwave broadcast so probably WWVB which would apparently imply a 60mm antenna, but it was a standard old school "GE digital clock radio" form factor so size wasn't at a premium.
> Sadly I think the actual antenna and hardware were relatively large since it's a long wave signal, but maybe with SDR it'll all fit on the head of a pin these days.
Unfortunately there's no real way to cheat physics as far as shrinking a wavelength goes. With RF antennas about the best you can do is a major dimension 1/10th the frequency of interest.
There are many DCF77 receivers in Germany that are contained in a square box that's barely large enough for a AA battery; the rest of the square contains the motor/gears and the electronics/receiver (incl. a ferrite loopstick antenna).
The wavelength is around 3.8km...
Yeah, that's because it's receiving an extremely narrowband signal accumulated over a long window so it can suffer the trash efficiency I'm talking about.
Back when I was studying computer science, I was taking the OS exam and the part about Lamport timestamp [0] was optional, but I had studied it because I loved it. When I mentioned it to my professor, he was so happy to hear something new that day that he asked me to describe it in details. This was the year 2001.
Many years later, in 2020, I ended up living in San Francisco, and I had the fortune to meet Leslie Lamport after I sent him a cold email. Lovely and smart guy. This is the text of the first part of that email, just for your curiosity:
Hey Leslie!
You have accompanied me for more than 20 years. I first met your name when studying Lamport timestamps.
And then on, and on, and on, up to a few minutes ago, when I realized that you are also behind the paper and the title of "Byzantine Generals problem", renamed after the "Albanian" generals to the suggestion of Jack Goldberg. Who is he? [1]
[0]: https://en.wikipedia.org/wiki/Lamport_timestamp
[1]: Jack Goldberg (now retired) was a computer scientist and Lamport's manager at SRI.
Ok,so people use NTP to "synchronize" their clocks and then write applications that assume the clocks are in exact sync and can use timestamps for synchronization, even though NTP can see the clocks aren't always in sync. Do I have that right?
If you are an engineer at Google dealing with Spanner, then you can in fact assume clocks are well synchronized and can use timestamps for synchronization. If you get commit timestamps from Spanner you can compare them to determine exactly which commit happened first. That’s a stronger guarantee than the typical Serializable database like postgresql: https://www.postgresql.org/docs/current/transaction-iso.html...
That’s the radical developer simplicity promised by TrueTime mentioned in the article.
That’s actually not at all what TrueTime guarantees and assuming they’ve solved a physical impossibility is dangerous technically as a founding assumption for higher level tech (which thankfully Spanner does not do).
What TrueTime says is that clocks are synchronized within some delta just like NTP, but that delta is significantly smaller thanks to GPS time sync. That enables applications to have tighter bounds on waiting to see if a conflict may exist before committing which is why Spanner is fast. CockroachDB works similarly but given the logistical challenge of getting GPS receivers into data centers, they worked to achieve a smaller delta through better NTP-like timestamps and generally get fairly close performance.
https://programmingappliedai.substack.com/p/what-is-true-tim...
> Bounded Uncertainty: TrueTime provides a time interval, [earliest, latest], rather than a single timestamp. This interval represents the possible range of the current time with bounded uncertainty. The uncertainty is caused by clock drift, synchronization delays, and other factors in distributed systems.
That’s exactly what I’m saying but you simply provided more details. TrueTime guarantees clocks are well synchronized: and of course that means synchronized to a reasonable upper bound. It’s no more possible for clocks to be absolutely synchronized, than for two line segments drawn independently to have absolutely the same length.
> you can compare them to determine exactly which commit happened first
This is the part I was referring to. You cannot just compare timestamps and know which happened first. You have to actually handle the case where you don’t know if there’s a happens before relationship between the timestamps. Thats a very important distinction
I quote from Spanner docs at https://docs.cloud.google.com/spanner/docs/true-time-externa...
> External consistency states that Spanner executes transactions in a manner that is indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. Because the timestamps generated for transactions correspond to the serial order, if any client sees a transaction T2 start to commit after another transaction T1 finishes, the system will assign a timestamp to T2 that is higher than T1's timestamp.
Of course there is always the edge case where two commits have the same commit timestamp. Therefore from the perspective of Spanner, they happen simultaneously and there is no way to determine which happens first. But there is no need to. There is no causality relationship between them. If you insist, you can arbitrarily assign a happens-before relationship in your own code and nothing will break.
Alternatively, you could guarantee the same synchronization using PPS and PTP to each host's DCD pin of their serial port or to specialized hardware such as modern PTP-enabled smart NICs/FPGAs that can accept PPS input. GPS+PPS gets you to within 20-80ns global synchronization depending on implementation (assuming you're all mostly in the same inertial frame), and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).
Of course, you can do this in good conditions. The extremely powerful part that TrueTime brings is how the system degrades when something goes wrong.
If everyone is synced to +/- 20ns, that's great. Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.
The other benefit of building in this uncertainty to the underlying software design is you don't have to have your entire infrastructure on the same hardware stack. If you have one datacenter that's 20yrs old, has no GPS infrastructure, and operates purely on NTP - this can still run the same software, just much more slowly. You might even keep some of this around for testing - and now you have ongoing data showing what will happen to your distributed system if GPS were to go away in a chunk of the world for some sustained period of time.
And in a brighter future, if we're able to synchronize everyone's clocks to +/- 1ns, the intervals just get smaller and we see improved performance without having to rethink the entire design.
> Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.
Most NTP/PTP appliances have internal clocks that are OCXO or rubidium that have holdover (even for several days).
If time is that important to you then you'll have them, plus perhaps some fibre connections to other sites that are hopefully out of range of the jamming.
> fibre connections to other sites that are hopefully out of range of the jamming.
I guess it's not inconceivable that eventually there's a global clock network using a White-Rabbit-like protocol over dedicated fibre. But if you have to worry about GPS jamming you probably have to worry about undersea cable cutting too.
Good thing cesium fountains are very accurate then...
In summary, with different business requirements you would build a different technical solution.
> and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).
TrueTime is the software algorithm for managing the timestamps. It’s agnostic to the accuracy of the underlying time source. If it was inaccurate then you get looser bounds and as you note higher latency. Google already does everything you suggest for TrueTime while also having atomic clocks in places.
Yup! I was referring to the original TrueTime/Spanner papers, not whatever's currently deployed. The original paper makes reference to distributed ordering guarantees at the milliseconds' scale precision, which implies many more transactions in flight in the uncertain state and coarser distributed ordering guarantees than the much tighter upper bound you can set with nanoseconds' precision and microseconds' comms latency...
More than a decade of progress, probably in no small part from Google pushing vendors to improve hardware :)
Amen. :)
Truetime is based on GPS and local atomic clocks. Google's latest timemasters are even better, around 10ns average.
Isn't that because Google has its own atomic clocks, rather than NTP which is (generally) using publicly available atomic clocks?
More that they use GPS to synchronize the clocks. Having your own atomic clock doesn’t really improve your accuracy except for within the single data center you have it deployed (although I’m sure there’s techniques for synchronizing with low bounds against nearby atomic clocks + GPS to get really tight bound so they don’t need one in every data center)
Depending on the application you would generally use PTP to get sub-microsecond accuracy. The real trick is that architecture should tolerate various clocks starting or jumping out of sync and self correct.
*misuse timestamps for synchronization
Unfortunate that the author doesn’t bring up FoundationDB version stamps, which to me feel like the right solution to the problem. Essentially, you can write a value you can’t read until after the transaction is committed and the synchronization infrastructure guarantees that value ends up being monotonically increasing per transaction. They use similar “write only” operations for atomic operations like increment.
The key here is a singleton sequencer component that stamps the new versions. There was a great article shared here on similar techniques used in trading order books (https://news.ycombinator.com/item?id=46192181).
Agree this is the best solution, I’d rather have a tiny failover period than risk serialization issues. Working with FDB has been such a joy because it’s serializable it takes away an entire class of error to consider, leading to simpler implementation.
Yes. A consistent total ordering is what you need (want) in distributed computing. Ultimately, causality is what is important, but consistent ordering of concurrent operations makes things much easier to work with.
Consistent ordering of concurrent operations is easy though. Just detect this case (via logical clocks) then order using node ids or transaction ids if the logical clocks show the transactions as being concurrent. Am I missing something? This feels like a very solved problem. (I’ve worked on CRDTs where we have the same problem. There exist incredibly fast algorithms for this.)
> Am I missing something?
I don’t think so, I think it is solved in the general sense. However what Spanner does is unique, and it does use synchronised clocks in order to do it.
However, Spanner does not solve the inter-continental acid database with high write throughput. So I don’t see it as ground breaking. CRDT’s are interesting, I’ve followed your work for a long time, but too constrained to solve this general problem I think.
Yes, though the API of having a write-only value that is a monotonically increasing counter is much simpler than having to think about causality or logical clocks.
For an article written about time, I would have thought there'd be a timestamp on the blog post. Just something to think about if someone stumbles upon this in a few years.
The article doesn't cover the inane stupid that is:
* NTP pool server usage requires using DNS
* people have DNSSEC setup, which requires accurate time or it fails
So if your clock is off, you cannot lookup NTP pool servers via DNS, and therefore cannot set your clock.
This sheer stupid has been discussed with package maintainers of major distros, with ntpsec, and the result is a mere shrug. Often, the answer is "but doesn't your device have a battery backed clock?", which is quite unhelpful. Many devices (routers, IOT devices, small boards, or older machines, etc) don't have a battery backed clock, or alternatively the battery may just have died.
Beyond that, the ntpsec codebase has a horrible bug where if DNS is not available when ntpsec starts, pool server addresses are never, ever retried. So if you have a complete power-fail in a datacentre rack, and your firewalls take a little longer to boot than your machines, you'll have to manually restart ntpsec to even get it to ever sync.
When discussing this bug the ntpsec lads were confused that DNS might not exist at times.
Long story short, make sure you aren't using DNS in any capacity, in NTP configs, and most especially in ntpsec configs.
One good source is just using the IPs provided by NIST. Pool servers may seem fine, but I'd trust IPs assigned to NIST to exist longer than any DNS anyhow. EG, for decades.
Even just a single accurate clock is a nightmare... https://www.npr.org/2025/12/21/nx-s1-5651317/colorado-us-off...
I would not call "loses track of time if it's [partially] unplugged" a nightmare.
AWS has the Google TrueTime equivalent precision clock available for public use[1] which makes this problem much easier to solve now. Auora DSQL uses it. Even third party db's like YugabyteDb make use of it.
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time...
Related: https://632nm.com/episodes/why-syncing-atomic-clocks-is-virt...
As a teacher I love the way Judah Levine explains
That’s because neither discrete time nor synchronous network comms exist.
Take a look at Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization - https://www.usenix.org/conference/nsdi18/presentation/geng (commercial version by the same authors - clockwork.io).
Another protocol that's not mentioned is PPS and its variants, such as WhiteRabbit.
A regular pulse is emitted from a specialized high-precision device, possibly over a specialized high-precision network.
Enables picosecond accuracy (or at least sub-nano).
As a user of WhiteRabbit, I can confirm a sub-10ps sync (two clocks phase lock) over 50km fiber connection for variable temperature of fiber (biggest problem of clock sync over fibers is temperature induced length change of the fiber itself, which needs to be measured and compensated).
Out of interest, how do you measure a sub-10ps phase lock between devices 50km apart?
The standards-compliant endpoints do all of the work. They count clock cycles for ping pong messages and share with each other the length of time so time-of-flight is tracked and compensated for.
Run 2 or 3 separate concurrent sync's and statistically compare the resulting clocks, for example.
Does wall clock time matter for anything but logging? For everything else one could just create any form „time“ to keep stuff in sync, no?
Isn't it also useful for checking validity periods for stuff like TLS certs or JWTs or Kerberos tickets?
We could use some made up „time“ for that since it’s not made for human consumption just sync between different systems.
I suppose this is changing with TLS certs moving towards ephemerality, but they used to have an entry on someone's calendar for renewal.
I wouldn't say it's a 'nightmare'. It's just more complicated than what regular folk think computers work when it comes to time sync. There's nothing nightmareish or scary about this, it's just using the best solution for your scenario, understanding limitations and adjusting expectations/requirements accordingly, perhaps relaxing consistency requirements.
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
Maybe. But I remember one game developer told that they face even a more challenging problem, which is the synchronization between players in multiplayer real-time games. Just imagine different users having significantly different network latencies in a multiplayer shooter where a couple milliseconds can be decisive. Someone makes a headshot when the game state is already outdated. If you think about this you can appreciate how it's complicated just to make the gameplay at least not awful...
And yet it has been done with different approaches, it's complicated, yes, as I said. But nothing nightmarish about it.
I took to distributed systems like a duck to water. It was only much later that I figured out that while there are things I can figure out in one minute that took other people five, there were a lot of others that you will have to walk them through step by step or they would never get there. That really explained some interactions I’d had when I was younger.
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
> I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics
Yeah. I was a physics major and it really helped to have had my naive assumptions about time and clocks completely demolished early on by taking classes in special and general relatively. When I eventually found my way into tech a lot of distributed systems concepts that are difficult to other people (clock sync, indeterminate ordering of events, consensus) came quite naturally because of all that early training.
I think it's no accident that distributed systems theory guru Leslie Lamport had written an unpublished book on General Relativity before he wrote the famous Time, Clocks and the Ordering of Events in a Distributed System paper and the Paxos paper. In the former in particular the analogy to special relatively is quite plain to see.
PTP isn't even that much more difficult, as long as you planned for it form the start
you buy the hardware, plug it all in, and it works
We once spent two weeks identifying a PTP handling bug in a particular Cisco switch firmware on a production site.
Sometimes hardware that has PTP support in the specs doesn't perform very well though, so if you do things at scale, being able to validate things like switches and network card drivers is useful too!
It's to the point timing server vendors I've spoken to have their own test labs where they have to validate network gear and then publish lists of recommended and tested configurations.
Even some older cards where you'd think the PTP issues would be solved still have weird driver quirks in Linux!
Reminds me of the old saying: 'If you have just one watch/clock, then you always know what time it is; but if you have two of them, then you are never sure!'
Clock sync is such a nightmare in robotics. Most OSes happily will skew/jump to get the time correct. Time jumps (especially backwards) will crash most robotics stacks. You might decide to ensure that you have synced time before starting the stack. Great, now your timestamps are mostly accurate, except what happens when you've used GPS as your time source, and you start indoors? Robot hangs forever.
Hot take: I've seen this and enough other badly configured time sync settings that I want to ban system time from robotics systems - time from startup only! If you want to know what the real world time was for a piece of data after, write what your epoch is once you have a time sync, and add epoch+start time.
If your requirements are “must have accurate time, must start with an inaccurate time, must not step time during operation, no atomic clocks, must not require a network connection, or a WWVB signal, must work without a GPS signal” then yes, you need to relax your requirements.
But it doesn’t have to be the first requirement you relax.
If it has a GPS already, it’s really easy to fall into the trap of just using it, but point taken. Then main requirement is accurate moment to moment time. Using GPS as the master clock mostly makes sense there.
C++11 distinguishes system_clock from steady_clock. As you say, using system_clock is a bug.
THIS is what will save us from the robot uprising!
the Huygens algorithm is also worth a look
https://www.usenix.org/system/files/conference/nsdi18/nsdi18...
Normally I would nod at the title. Having lived it.
But I just watched/listened to a Richard Feynmann talk on the nature of time and clocks and the futility of "synchronizing" clocks. So I'm chuckling a bit. In the general sense, I mean. Yes yes, for practical purposes in the same reference frame on earth, it's difficult but there's hope. Now, in general ... synchronizing two clocks is ... meaningless?
https://www.youtube.com/watch?v=zUHtlXA1f-w
Feynman was not entirely sincere. The implosion of nuclear device requires precise synchronization of multiple detonations. Basically the more precisely you can trigger the less fissile material you need for the sphere. To the day high accuracy bridgewire/foil bridge designs remain on ITAR.
Einstein was worried about whether people in two different relativistic frames would see cause and effect reversed.
Wild. My layperson mind goes to a simple example, which may or may not be possible, but please tell me if this is the gist:
Alice and Bob, in different reference frames, both witness events C and D occurring. Alice says C happened before D. Bob says D happened before C. They're both correct. (And good luck synchronizing your watches, Alice and Bob!)
Yes that definitely happens. People orbiting Polaris would be seeing two supernovas explode at different times than us due to the speed of light. Polaris is 400 light years away so the gap could be large.
But when you are moving you may see very closely spaced events in different order, because you’re moving toward Carol but at an angle to Doug. Versus someone else moving toward Doug at an angle to Carol.
That will be the case when Alice stands close to where C happens, and Bob stands close to where D happens.
It's a little trickier to imagine introducing cause-and-effect though. (Alice sees that C caused D to happen, Bob sees that D caused C to happen).
I think a "light cone" is the thought-experiment to look up here.
There is distinction between seeing when events happened, and when they really happened. The latter can be reconstructed by an observer.
In special relativity, time is relative and when things actually happened can be different in different frames. Casually linked events are always really in the same order. But disconnected events can be seen in different orders depending on speed of observer.
> But disconnected events can be seen in different orders depending on speed of observer.
What are "disconnected events"? In a subtle but still real sense, are not all events causally linked? e.g. gravitationally, magnetically, subatomically or quantumly?
I can understand that our simple minds and computational abilities lead us to consider events "far away" from each other as "disconnected" for practical reasons. But are they really not causally connected in a subtle way?
There are pieces of space time that are clearly, obviously causally connected to each other. And there are far away regions of the universe that are, practically speaking, causally disconnected from things "around here". But wouldn't these causally disjoint regions overlap with each other, stringing together a chain of causality from anywhere to anywhere?
Or is there a complete vacuum of insulation between some truly disconnected events that don't overlap with any other observational light cone or frame of reference at all?
We now know that gravity moves at the speed of light. Imagie that you aretwo supernovas that for some unknown reason, explode at essentially the same time. Just before you die from radiation exposure, you will see the light pulse from each supernova before each supernova can 'see' the gravitational disruption caused by the other. Maybe a gravity wave can push a chain reaction on the verge of happening into either a) happening or b) being delayed for a brief time, but the second explosion happened before the pulse from the first could have arrived. So you're pretty sure they aren't causally linked.
However if they were both triggered by a binary black hole merger, then they're dependent events but not on each other.
But I think the general discussion is more of a 'Han shot first' sort. One intelligent system reacting to an action of another intelligent system, and not being able to discern as a person from a different reference frame as to who started it and who reacted. So I suppose when we have relativistic duels we will have to preserve the role of the 'second' as a witness to the events. Or we will have to just shrug and find something else to worry about.
Causality moves at the speed of light. Events that are farther apart are called spacelike and aren't causally connected.
I think you might be confusing events that have some history between them, and those are influence each other. Like say right now, Martian rover sends message to Earth and Earth sends message to them, those aren't causally connected cause don't know about the other message until light speed delay has passed.
> But wouldn't these causally disjoint regions overlap with each other
Yes.
> stringing together a chain of causality from anywhere to anywhere?
No? Causality reaching one edge of a sphere doesn't mean it instantaneously teleports to every point in that same sphere. This isn't a transitive relationship.
> What are "disconnected events"?
The sentence you're responding to seems like a decent definition. Disconnected events are events which might be observed in either order depending on the position of an observer.
If Bob and Alice are moving at half the speed of light in opposite directions.
it might be meaningless, but in practical terms just don't check util.c from the gravity well into the git repo in orbit.
Vector clocks are one of the other things Barbara Liskov is known for.
Absolute synchronization impossible?? Challenge accepted.
Nature (laws of physics) is agains you on this: it is in fact impossible for everyone. What is in sync for some observers can be out of sync for others (depends on where they are, i.e. gravity, and how they relatively move). See general and special relativity principle of simultaneity [1].
1. https://en.wikipedia.org/wiki/Relativity_of_simultaneity
I think you just nerd-sniped me but I’m not convinced it’s impossible to assign a consistent ordering to events with relativistic separations.
For starters, the spacetime interval between two events IS a Lorentz invariant quantity. That could probably be used to establish a universal order for timelike separations between events. I suspect that you could use a reference clock, like a pulsar or something to act as an event against which to measure the spacetime interval to other events, and use that for ordering. Any events separated by a light-like interval are essentially simultaneous to all observers under that measure.
The problem comes for events with a space like or light like separation. In that case, the spacetime interval is still conserved, but I’m not sure how you assign order to them. Perhaps the same system works without modification, but I’m not sure.
For any space-like event you can find reference frames where things happen in different order. For the time-like situation you described the order indeed exists within the cone, which is to say that causality exists.
In physics, time is local and relative, independent events don’t need a global ordering. Distributed databases shouldn’t require one either. The idea of a single global time comes from 1980s single-node database semantics, where serializability implied one universal execution order. When that model was lifted into distributed systems, researchers introduced global clocks and timestamp coordination to preserve those guarantees, not because distributed systems fundamentally need it. It’s time we rethink this., Only operations that touch the same piece of data require ordering. Everything else should follow causality like the physical universe, independent events don’t need to agree on sequence, only dependent ones do. Global clocks exist because some databases forced serializable cross-object transactions onto distributed systems, not because nature requires it. Edit: I welcome for a discussion with people who disagree and downvote.
You can’t be certain that any given mutating operation you perform now won’t be relied upon for some future operation, unless the two operations are performed in entirely different domains of data. Even “not touching (by which I assume you mean mutating) the same data” isn’t enough. If I update A in thread 0 from 1 to 2, then I update B in thread 1 to the value of A+1, then the value of B could end up being 2 or 3, depending on whether the update of A reached thread 1.
In distributed systems, dependencies flow forward, not backward. Causal dependency only exists when an operation actually references earlier state. If B = A+1, then yes, B is causally dependent on A and they must share an order. But that dependency is created by the application logic, not assumed globally in advance.
We shouldn’t impose a universal timeline just because some future operation might depend on some past one. Dependencies should be explicit and local: if two operations interact, they share a causal scope; if they don’t, they shouldn’t pay the cost of coordination.
Timesync isn’t a nightmare at all. But it is a deep rabbit hole.
The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!
My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.
I kinda don’t like PTP. Too complicated and requires specialized hardware.
This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.
You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).
TicSync is cool and useful. https://sci-hub.se/10.1109/icra.2011.5980112
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
At this stage, it's difficult to find an half-decent ethernet quality MAC that doesn't have PTP timestamping. It's not a particularly complicated protocol, either.
I needed to distribute PPS and 10MHz into a GNSS-denied environment, so last summer I designed a board to do this using 802.1AS gPTP with a uBlox LEA-M8T GNSS timing receiver, a 10MHz OCXO and an STM32F767 MCU. This took me about four weeks. Software is written in C, and the PTP implementation accounts for 1500 LOC.
> I kinda don’t like PTP. Too complicated and requires specialized hardware.
In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.
> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.
This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.
> TicSync is cool
I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.
> This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock.
Well, it requires having the conversion function for each edge in the traversed path. And such function needs to exist only at the location(s) performing the conversion.
> obviously it adds complexity to end-stations beyond what PTP requires
If you have PTP and it works then stick with it. If you’re trying to timesync a network of wearable devices then you don’t have PTP stamping hardware.
> because packets traversing different routes
Fair callout. It’s probably a more useful model for less internty use cases. Of which there are many!
For example when trying to timesync a collection of different sensors on different devices/microcontrollers.
Roboticists like CanBus and Ethercat. But even that is kinda overkill imho. TicSync can get you tens of microseconds of precision in user space.
"I kinda don’t like PTP. Too complicated and requires specialized hardware."
?????
I run PTP on everything from RPI's to you name it, over fiber, ethernet, etc.
The main thing hardware gives is filtration of PTP packets or hardware timestamping.
Neither is actually required, though some software has decided to require it.
Additionally, something like 99% of sold gigabit or better chipsets since 2012 support it (I210 et al)
Robots and VR headsets and wearables and microcontrollers and sensors and trackers and Linux and Windows oh my!
Love learning new things. This also explains why my casio clock sync starts skewing over time
PTP requires support not only on your network, but also on your peripheral bus and inside your CPU. It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states.
You can if you just run PTP (almost) entirely on your NIC. The best PTP implementations take their packet timestamps at the MAC on the NIC and keep time based on that. Nothing about CPU processing is time-critical in that case.
Well, if the goal is for software running on the host CPU to know the time accurately, then it does matter. The control loop for host PTP benefits from regularity. Anyway NICs that support PTP hardware timestamping may also use PCI LTR (latency tolerance reporting) to instruct the host operating system to disable high-exit-latency sleep features, and popular operating systems respect that.
> The control loop for host PTP benefits from regularity.
How much regularity? If you sent PTP packets with 5 milliseconds of randomness in the scheduling, does that cause real problems? It's still going to have an accurate timestamp.
> instruct the host operating system to disable high-exit-latency sleep features
Why, though? You didn't explain this. As long as the packet got timestamped when it arrived, the CPU can ask the NIC how many nanoseconds ago that was, and correct for how long it was asleep. Right?
"Well, if the goal is for software running on the host CPU to know the time accurately, then it does matter. "
I'm sorry, this is just moving the goalposts.
You said "It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states."
This is flat wrong, as pointed out.
Now you are pedantically arguing that some NIC's that do PTP hardware timestamping might also use a feature that some operating systems might respect.
That's a very far cry from "It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states".
In most cases, people would just say "hey i was wrong about that but there are cases that i think matter where it falls down".
I see nothing in your pair of unnecessarily belligerent comments that actually contradicts what I said. There are host-side features that enable the clock discipline you are observing, even if you are apparently not aware of them.
This is a really helpful contribution - if only everyone could be as smart as you.
If mine are somehow too beligerent for you, which is hilarious given how arrogant and beligerent your initial comment and responses come off as (maybe you are not aware?), then perhaps you'd like to actually engage any of the other comments that point out how wrong you are in a meaningful way?
Or are those too beligerent as well?
Because you didn't respond to any of those, either.
How so? If the NIC is processing the timestamps as it arrives/leaves on the wire, the latency and jitter in the rest of the system shouldn't matter.
PTP does not require support on your network beyond standard ethernet packet forwarding when used in ethernet mode.
In multicast IP mode, with multiple switches, it requires what anything running multicast between switches/etc would require (IE some form of IGMP snopping or multicast routing or .....)
In unicast IP mode, it requires nothing from your network.
Therefore, i have no idea what it means to "require support on the network".
I have used both ethernet and multicast PTP across a complete mishmash of brands and types and medias of switches, computers, etc, with no issues.
The only thing that "support" might improve is more accurate path delay data through transparent clocks. If both master and slave do accurate hardware timestamping already, and the path between them is constant, it is easily possible to get +-50 nanoseconds without any transparent clock support.
Here is the stats from a random embedded device running PTP i just accessed a second ago:
So this embedded ARM device, which is not special in any way, is maintaining time +-35ns of the grandmaster, and currently 30ns of GPS time.The card does not have an embedded hardware PTP clock, but it does do hardware timestamp and filtering.
This grandmaster is an RPI with an intel chipset on it and the PPS input pin being used to discipline the chipset's clock. It stays within +-2ns (usually +-1ns) of GPS time.
Obviously, holdover sucks, but not the point :)
This qualifies as better-than-NTP for sure, and this setup has no network support. No transparent clocks, etc. These machines have multiple media transitions involved (fiber->ethernet), etc.
The main thing transparent clock support provides in practice is dealing with highly variable delay. Either from mode of transport, number of packet processors in between your nodes, etc. Something that causes the delay to be hard to account for.
The ethernet packet processing in ethernet mode is being handled in hardware by the switches and basically all network cards. IP variants would probably be hardware assisted but not fully offloaded on all cards, and just ignored on switches (assuming they are not really routers in disguise).
The hardware timestamping is being done in the card (and the vast majority of ethernet cards have supported PTP harware timestamping for >1 decade at this point), and works perfectly fine with deep CPU sleep states.
Some don't do hardware filtering so they essentially are processing more packets that necessary but .....
> Google faced the clock synchronization problem at an unprecedented scale with Spanner, its globally distributed database. They needed strong consistency guarantees across data centers spanning continents, which requires knowing the order of transactions.
> Here’s a video of me explaining this.
Do you need a video? Do we need a 42 minute video to explain this?
I generally agree with Feynman on this stuff. We let explanations be far more complex than they need to be for most things, and it makes the hunt for accidental complexity harder because everything looks almost as complex as the problems that need more study to divine what is actually going on there.
For Spanner to be useful they needed a high transaction rate and in a distributed system that requires very tight grace periods for First Writer Wins. Tighter than you can achieve with NTP or system clocks. That’s it. That’s why they invented a new clock.
Google puts it this way:
Under external consistency, the system behaves as if all transactions run sequentially, even though Spanner actually runs them across multiple servers (and possibly in multiple datacenters) for higher performance and availability.
But that’s a bit thick for people who don’t spend weeks or years thinking about distributed systems.