Systems Correctness Practices at AWS: Leveraging Formal and Semi-Formal Methods

(queue.acm.org)

174 points | by yarapavan 3 months ago ago

60 comments

Cyphase 3 months ago
Leslie Lamport gave the closing keynote at SCaLE 22x this year, talking about formal methods and TLA+. He mentioned some previous work Amazon has done in that area.
https://www.youtube.com/watch?v=tsSDvflzJbc
> Coding isn't Programming - Closing Keynote with Leslie Lamport - SCaLE 22x
[-]
- kuleshovsoup 3 months ago
  Thanks for sharing!
- jlcases 3 months ago
  Very interesting video. Many thanks
csbartus 3 months ago
I've recently created a likely-correct piece of software based on these principles.
https://www.osequi.com/studies/list/list.html
The structure (ontology, taxonomy) is created with ologs, a formal method from category theory. The behavior (choreography) is created with a semi-formal implementation (XState) of a formal method (Finite State Machines)
The user-facing aspect of the software is designed with Concept Design, a semi-formal method from MIT CSAIL.
Creating software with these methods is refreshing and fun. Maybe one day we can reach Tonsky's "Diagrams are code" vision.
https://tonsky.me/blog/diagrams/
[-]
- matu3ba 3 months ago
  "Diagrams are code" exists as ladder diagrams for PLC, from which Structured Text can be derived, which typically are used by PICs for simple to estimate time behavior in very time-sensitive/critical (real-time) control applications.
  Stack + dynamic memory and other system resource modeling would need proper models with especially memory life-time visualization being an open research problem, let alone resource tracking being unsolved (Valgrind offers no flexible API, license restrictions, incomplete and other platforms than Linux are less complete etc).
  Reads and writes conditioned on arithmetic have all the issues related to complex arithmetic and I am unaware of (time) models for speculation or instruction re-ordering, let alone applied compiler optimizations.
  [-]
  - jlcases 3 months ago
    Fascinating approach with ologs and FSMs I've been working on PAELLADOC (https://github.com/jlcases/paelladoc) which applies MECE principles to documentation structure. Similar philosophy but focused on knowledge representation correctness rather than software behavior.
    The results align with your experience - structured conceptual foundations make development more intuitive and documentation more valuable for both humans and AI tools.
pera 3 months ago
> we also sought a language that would allow us to model check (and later prove) key aspects of systems designs while being more approachable to programmers.
I find it a bit surprising that TLA+ with PlusCal can be challenging to learn for your average software engineer, could anyone with experience in P show an example of something that can be difficult to express in TLA+ which is significantly easier in P?
[-]
- ngruhn 3 months ago
  It's probably just the syntax
  [-]
  - goostavos 3 months ago
    I think it goes beyond that.
    TLA+ requires you to think about problems very differently. It is very much not programming.
    It took me a long time (and many false starts) to be able to think in TLA. To quote Lamport, the real challenge of TLA+ is learning to think abstractly.
    To that, I think plusCal and the teaching materials around it do a disservice to TLA. The familiar syntax hides the wildly different semantics
gqgs 3 months ago
A key concern I've consistently had regarding formal verification systems is: how does one confirm the accuracy of the verifier itself?
This issue appears to present an intrinsically unsolvable problem, implying that a formally verified system could still contain bugs due to potential issues in the verification software.
While this perspective doesn't necessarily render formal verification impractical, it does introduce certain caveats that, in my experience, are not frequently addressed in discussions about these systems.
[-]
- alexisread 3 months ago
  I'd guess the best approach is similar to security - minimal TCB (trusted computing base) that you can verify by hand, and then construct your proof checker on top, and have it verify itself. Proof and type checkers have a lot of overlap, so you can gain coverage via the language types.
  I can't recall exactly, but I think CakeML (https://cakeml.org/jfp14.pdf) had a precursor lisp prover, written in a verified lisp, so would be amenable to this approach.
  EDIT: think it was this one https://www.schemeworkshop.org/2016/verified-lisp-2016-schem...
  [-]
  - guerrilla 3 months ago
    This is correct. That's why you have core type theories. Very small, easy to understand and implement in a canonocal way. Other languages can be compiled to them.
- newAccount2025 3 months ago
  A deeply considered take: this is theoretically interesting but in practice matters very little.
  It’s like worrying about compiler bugs instead of semantic programming errors. If you are getting to the point where you have machine checked proofs, you are already doing great. It’s way more likely that you have a specification bug or modeling error than an error in the solver.
  If you are worried about it, many people have solutions that are very compelling. There are LCF style theorem provers like HOL Light with tiny kernels. But for any real world systems, none of this matters, use any automated solvers you like, it will be fine.
- ninetyninenine 3 months ago
  This is not just a problem with formal verification systems. It's a problem intrinsic to math, science and reality.
  We call these things axioms. The verifier must be assumed to be true.
  For all the axioms in mathematics... we don't know if any of those axioms are actually fully true. They seem to be true because we haven't witnessed a case where they aren't true. This is similar to how we verify programs with unit tests. For the verifier it's the same thing, you treat it like an axiom you "verify" it with tests.
  [-]
  - gqgs 3 months ago
    I believe there is a distinction between those concepts.
    Axioms are true by their own definition. Therefore, discovering an axiom to be false is a concept that is inherently illogical.
    Discovering that a formal verification system produced an incorrect output due to a bug in its implementation is a perfectly well-defined concept and doesn't led to any logical contradictions; unless you axiomatically define the output of formal verification systems to be necessarily correct.
    I believe this definition don't make sense in the general case considering the number of vectors that can introduce errors into software.
    [-]
    - ninetyninenine 3 months ago
      Being true by definition is just assigning something to be true by assumption.
      Who makes a definition? Are definitions intrinsic to reality? No. Definitions are made up concepts from language which is also another arbitrary made up thing. When something is defined to be an axiom the human is making something up and arbitrarily assigning truth to that concept.
      > I believe this definition don't make sense in the general case considering the number of vectors that can introduce errors into software.
      You’re making up a definition here. Which is the same as an axiom.
      Your question is what verifies the verifier? But that question is endless because what verifies the verifier of the verifier? It’s like what proves the axiom? And what proves the proof of the axiom?
      The verifier defines the universe of what is correct and what is incorrect. It is the axiom. It is an assumption at its core. You can pretty it up by calling it a more advanced thing like “definition” but in the end it is the exact same thing as assuming and making up a fact that something is true without any proof.
- trenchgun 3 months ago
  Verifiers can be based on a small proven kernel. That is not really the issue.
  The issue is writing the formal specification. Formal verification is in nutshell just proving the equivalence of two different formal constructs: one that is called the spec and one that is called the system (or program). Writing specs is hard.
- csbartus 3 months ago
  Perhaps there is no such thing like absolute truth.
  In category theory / ologs, a formal method for knowledge representation, the result is always mathematically sound, yet ologs are prefixed with "According to the author's world view ..."
  On the other way truth, even if it's not absolute, it's very expensive.
  Lately AWS advocates a middle ground, the lightweight formal methods, which are much cheaper than formal methods yet deliver good enough correctness for their business case.
  In the same way MIT CSAIL's Daniel Jackson advocates a semi-formal approach to design likely-correct apps (Concept design)
  It seems there is a push for better correctness in software, without the aim of perfectionism.
- whattheheckheck 3 months ago
  Yeah it reminds me of https://en.m.wikipedia.org/wiki/Tarski%27s_undefinability_th...
- solidsnack9000 3 months ago
  ...it does introduce certain caveats that, in my experience, are not frequently addressed in discussions about these systems.
  What are those?
  As far as I can tell, people don't deploy verified or high assurance systems without testing or without online monitoring and independent fail safes.
- Ar-Curunir 3 months ago
  There are many things people try to do here: multiple implementations of the verifier, formally verified implementations of the verifier in other verification tools, by-hand verification, etc.
- deterministic 3 months ago
  This is a solved problem. See CakeML for example.
OhMeadhbh 3 months ago
I find this highly unlikely. My first day at Amazon I encountered an engineer puzzling over a perfect sine wave in a graph. After looking at the scale I made the comment "oh. you're using 5 minute metrics." Their response was "how could you figure that out just by looking at the graph?" When I replied "Queuing theory and control theory," their response was "what's that?"
Since then, Amazon's hiring practices have only gotten worse. Can you invert a tree? Can you respond "tree" or "hash map" when you're asked what is the best data structure for a specific situation? Can you solve a riddle or code an ill-explained l33tcode problem? Are you sure you can parse HTML with regexes? You're Amazon material.
Did you pay attention to the lecture about formal proofs. TLA+ or Coq/Kami? That's great, but it won't help you get a job at Amazon.
The idea that formal proofs are used anywhere but the most obscure corners of AWS is laughable.
Although... it is a nice paper. Props to Amazon for supporting Ph.D.'s doing pure research that will never impact AWS' systems or processes.
[-]
- scubbo 3 months ago
  They specifically mention "the P team at AWS". The two following things are perfectly able to be simultaneously true:
  * The average Amazon engineer is not expected to have awareness of CS fundamentals that go beyond LeetCode-y challenges
  * The average Amazon engineer builds using tools which are each developed by a small core team who _are_ expected to know those fundamentals, and who package up those concepts to be usefully usable without full understanding
  (I did 10 years on CDO, and my experience matches yours when interacting with my peers, but every time I interacted with the actual Builder Tools teams I was very aware that lived in different worlds)
  [-]
  - nextos 3 months ago
    > The average Amazon engineer is not expected to have awareness of CS fundamentals that go beyond LeetCode-y challenges
    I find this a bit unsettling. There are dozens of great CS schools in the US. Even non-elite BSc programs in EU sometimes teach formal methods.
    There are also some good introductory books now, e.g. [1]. Perhaps its time to interview more on concepts and less on algorithmic tricks favored by LeetCode?
    I doubt current undergrads can't go beyond LeetCode-like challenges.
    [1] Formal Methods, An Appetizer. https://link.springer.com/book/10.1007/978-3-030-05156-3
    [-]
    - retrocryptid 3 months ago
      Modern CS programs teach to what they perceive to be the interview their students will encounter after graduation: what is a tree data structure, how to craft a SQL query and how to calculate a CRC with a python library. More advanced CS/CE departments still teach discrete math and compilers/parsing for students intending to go to grad schools.
      My experience with recent CS grads is it's easier to hire Art and Political Science grads and take the time to teach them programming and all it's fundamentals. At least they won't argue with you when you tell them not to use regexes to parse HTML.
- Dichlorodiphen 3 months ago
  We recently did a proof of concept with P for our system, and the reception was warm enough that I expect adoption within the year. I wouldn’t exactly call us obscure, at least in the sense used above—greenfield big data system with a bog-standard mix of AWS services.
  I will say that time to productivity with P is low, and, in my experience, it emphasizes practical applicability more so than traditional modeling languages like TLA+ (this point is perhaps somewhat unsubstantiated, but the specific tooling we used is still internal). That is to say, it is fairly easy to sell to management, and I can see momentum already starting to pick up internally. No Ph.D. in formal methods required.
  And re: the hiring bar, I agree that the bulk of the distribution lies a bit further left than one would hope/imagine, but there is a long tail, and it only takes a handful of engineers to spearhead a project like this. For maintainability, we are betting on the approachable learning curve, but time will tell.
  [-]
  - matu3ba 3 months ago
    Thanks for the interesting insight. 1. What would you recommend to model multi-thread or multi-process problems on the same system? Is there a good read to give recommendations? 2. Is there a good overview on trade-offs between CSP, Actor model and other methods for this? 3. Are you aware of any process group models?
  - Taikonerd 3 months ago
    The problem (in my mind) with using P is that only a small % of problems are formalizable as communicating state machines.
    In your greenfield big data system, what were the state machines? What states could they be in?
- gqgs 3 months ago
  Cool anecdote but I'm not sure how reasonable it is to expect every person to have expert domain knowledge and recall of every single computer science field just because they got a job to work at Amazon or any other MAANG company.
- 3 months ago
  [deleted]
- atdt 3 months ago
  OK, I will be the useful idiot. I don't fully understand your anecdote. Could you explain what is exactly it that you perceived and that the other engineer failed to see?
  [-]
  - sethammons 3 months ago
    Periodic and regular cycles (sine wave) suggests that there is over correction happening in the system, preventing a stable point. More regular measurements or tempered corrections may be called for.
    Concrete example: the system sees it's queue utilization as high so it throttles incoming requests, but for too long, the queue looks super healthy on the next check, so the throttle removed but too long until checking again, and now the util is too high again.
- georgeburdell 3 months ago
  A graph of what?
- 3 months ago
  [deleted]
deterministic 3 months ago
I have used simulators to develop hard-to-make-correct distributed code.
It helped me find all kinds of subtle non-intuitive bugs that would have gone unnoticed had I only used hand coded automatic tests.
Highly recommended.
nullorempty 3 months ago
And what teams use these methods exactly?
[-]
- riknos314 3 months ago
  The article directly mentions several.
  > Teams across AWS that build some of its flagship products—from storage (e.g., Amazon S3, EBS), to databases (e.g., Amazon DynamoDB, MemoryDB, Aurora), to compute (e.g., EC2, IoT)—have been using P to reason about the correctness of their system designs.
  Later it more specifically mentions these (I probably missed a few):
  S3 index, S3 ShardStore, Firecracker VMM, Aurora DB engine commit protocol, The RSA implementation on Graviton2
  (EDIT: formatting)
  [-]
  - yarapavan 3 months ago
    If you are wondering "why do formal methods at all?" or "how is AWS using P to gain confidence in correctness of their services?", the following re:Invent 2023 talk answers this question, provides an overview of P, and its impact inside AWS: (Re:Invent 2023 Talk) Gain confidence in system correctness & resilience with Formal Methods (Finding Critical Bugs Early!!) - https://www.youtube.com/watch?v=FdXZXnkMDxs
    Github: https://github.com/p-org/P
- jmccarthy 3 months ago
  https://aws.amazon.com/verified-permissions/ (AVP) is a team and product which uses the formally verified Cedar language. https://arxiv.org/abs/2403.04651
3 months ago
[deleted]
jlcases 3 months ago
I've noticed that the formalization of methods described by AWS parallels what we need in technical documentation. Complex systems require not just formal verification but also structured documentation following MECE principles (Mutually Exclusive, Collectively Exhaustive).
In my experience, the interfaces between components (where most errors occur) are exactly where fragmented documentation fails. I implemented a hierarchical documentation system for my team that organizes knowledge as a conceptual tree, and the accuracy of code generation with AI assistants improved notably.
Formal verification tools and structured documentation are complementary: verification ensures algorithmic correctness while MECE documentation guarantees conceptual and contextual correctness. I wonder if AWS has experimented with structured documentation systems specifically for AI, or if this remains an area to explore.
[-]
- csbartus 3 months ago
  Formalization, correctness is about models. [1]
  There are formal methods where the underlying model is mathematically sound. There are semi-formal methods where the underlying model is structured but not proven to be sound.
  For example, in your case ("organizes knowledge") a formal method is ologs from category theory. That method assures that the concepts and their relationship in your knowledge base are mathematically correct.
  When you want to transfer that knowledge base into a documentation system you might want look for a mathematically sound model, but I'm afraid there is no such model, so what's left is a semi-formal method / a likely-correct model.
  Right now I'm building such a likely-correct model for writing, contact me for more info.
  [1] https://www.osequi.com/slides/formalism-correctness-cost/for...
  [-]
  - jlcases 3 months ago
    Thank you for introducing me to ologs - they're fascinating. I'm intrigued by the distinction you make between formal methods and semi-formal approaches.
    I'd love to explore how your "likely-correct model for writing" might complement the MECE-based documentation system I've been developing. Since you're building a model focused on writing correctness, perhaps there's an opportunity to collaborate?
    My approach addresses the organizational structure of documentation, while your work seems to focus on ensuring correctness at the content level. Together, these could potentially create a more comprehensive solution that combines hierarchical organization with formal correctness principles.
    Would you be interested in discussing further? I'm curious to learn more about your model.
- rixed 3 months ago
  There is another way formal methods parallel documentation: both are futile unless you can prove that the modelized/documented system matches the actual, live one.
  "This is just a matter of discipline" is not very convincing, especially when the discipline involves long unpaid afterhours.
  The examples I've seen in this report from AWS are mostly about one-shot events (helping going through important changes). It's good to see formal methods used in these cases of course, but I'd really like to read stories about how sustained use of formal methods helps reclaiming the high costs of the initial investment as the actual system evolves alongside the modelization.
  [-]
  - hwayne 3 months ago
    At least in the TLA+ community, the new state-of-art approach is to use the formal model to generate a test suite.
    [-]
    - goostavos 3 months ago
      That is interesting. Link?
  - gamegoblin 3 months ago
    The part about S3 using lightweight formal methods in their ShardStore rust codebase is ongoing and operates on the system itself, not a model
- jjmarr 3 months ago
  Could you explain how your "hierarchical documentation system" works? Does the tree parallel the code? Is there a particular tool that you use?
  [-]
  - jlcases 3 months ago
    The hierarchical system I've developed organizes knowledge in nested MECE categories (Mutually Exclusive, Collectively Exhaustive). Rather than paralleling code exactly, it creates a conceptual tree that represents the mental model behind the system.
    For example, in an e-commerce API: - Level 1: Core domains (Products, Orders, Users) - Level 2: Sub-domains (Product variants, Order fulfillment, User authentication) - Level 3: Implementation details (Variant pricing logic, Fulfillment status transitions)
    I'm currently using plain Markdown with a specific folder structure, making it friendly to version control and AI tools. The key innovation is the organization pattern rather than the tooling - ensuring each concept belongs in exactly one place (mutually exclusive) while collectively covering all needed context (collectively exhaustive).
    I should note that this is a work in progress - I've documented the vision and principles, but the implementation is still evolving. The early results are promising though: AI assistants navigating this conceptual tree find the context needed for specific coding tasks more effectively, significantly improving the quality of generated code.
    [-]
    - cgio 3 months ago
      The question would be whether with AI assistants you need to go through the effort of enforcing MECE or just strive for exhaustiveness and let them sort out repetition. I am also wondering whether there is a signal in repetition or even conflicts in documentation.
- jkaptur 3 months ago
  I'm interested in learning more about the benefits of MECE - I've never heard that before. In particular, it seems radically different from Divio's system [0], which presents the same information in many different ways.
  0: https://docs.divio.com/documentation-system/
  [-]
  - zhengyi13 3 months ago
    Is this not https://diataxis.fr/ ?
    (I'm engaged somewhat in trying to get our team to write any documentation; once I've got that, I'll start trying to organize along exactly these principles)
    [-]
    - jlcases 3 months ago
      Yes, that's Diataxis (formerly Divio). I faced similar challenges and found that combining it with MECE principles in my PAELLADOC framework made documentation much easier, especially with AI tools. Good luck getting your team started
  - 3 months ago
    [deleted]
  - jlcases 3 months ago
    Great question about MECE vs Divio's system! They actually complement each other rather than conflict.
    MECE (Mutually Exclusive, Collectively Exhaustive) comes from management consulting and focuses on organizing concepts without overlap or gaps. Divio's system focuses on documentation types (tutorials, how-to, reference, explanation).
    In the AI era, your question raises a fascinating point: if AI can dynamically adapt content to the user's needs, do we still need multiple presentation formats? I believe we do, but with a shift in approach.
    With AI tools, we can maintain a single MECE-structured knowledge base (optimized for conceptual clarity and AI consumption) and then use AI to dynamically generate Divio-style presentations based on user needs. Rather than manually creating four different document types, we can have AI generate the appropriate format on demand.
    In my experiments, I've found that a well-structured MECE knowledge base allows AI to generate much more accurate tutorials, how-tos, references, or explanations on demand. The AI adapts the presentation while drawing from a single source of truth.
    This hybrid approach gives us the best of both worlds: conceptual clarity for AI consumption, and appropriate presentation for human needs - all while reducing the maintenance burden of multiple document versions.
- integralof5y 3 months ago
  MECE seems to be just a mathematical partition, so it seems there is not need of a new term and acronym.
wg0 3 months ago
[flagged]
[-]
- catlover76 3 months ago
  [dead]
neuroelectron 3 months ago
Great April 1st gag. Seems to have gone over everyone's head.