> If a function accepts a value that can be String or null, it is ok to pass it a value that is guaranteed to be a String and not null. Or in other words, String is a subtype of String?!
That's... not true? The language can just have an implicit conversion rule for convenience.
> properties like nullability that don’t affect memory layout
That's not true either? It might not affect memory layout because all your objects are boxed or you have niche value optimisation you can use to tag nulls, but make an unboxed integer nullable and you'll definitely see your memory layout be affected.
> That's... not true? The language can just have an implicit conversion rule for convenience.
It is true regardless. What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system. Depending on your definition (reasonable people disagree and could be contextual), the memory layout is also rarely a concern when modeling type theory.
However, I can't make sense of the idea that "most languages don't have subtypes". Every language I can think of above ASM has at least some. In C, int8_t is a subtype of int16_t, for example.
And the idea that this is new territory... Here they contrast subtypes in C++ to that of "the new language Java".
I agree with the comment that OP really should make it explicit what they mean. It presents itself as "what is subtypes" but they unfortunately use the same term for different concepts in the same text, which I guess contributes to the wider confusion in this thread. Inserting a few "Algebraic ..." and making the context explicit in a place or two.
Or put differently: Algebraic subtypes are a subtype of subtypes.
> What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system.
You can handwave away these implementation details in a high-level language, but a thoughtfully-designed low-level language needs to consider whether or not any given language feature can feasibly be implemented without imposing an undesirable runtime cost.
This is also a pretty poor example for us to index on, because the idea of a function taking a `String?` (or `Option<String>` or whatever) is already so bizarre that's it naturally derails the discussion for want of a better example. It only makes sense to write such a function explicitly for users who do already have the nullable/optional type; if you have the non-nullable/non-optional version of the type, you should just be calling functions defined on that type.
> The language can just have an implicit conversion rule for convenience
the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
> make an unboxed integer nullable ...
I don't think any language allows this, in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"
I believe (unless I have misunderstood you) that both your examples are subsumed (heh) by subtyping
> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.
> I don't think any languages allow this
Plenty do, e.g. Rust's Option works that way.
> in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"
Saying the same thing with fancier words doesn't explain anything. The point is you can't simply treat non-nullable as a subtype of nullable in general, because this case exists.
> Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.
Does the actual data layout impact the observation?
If you have A, something that accepts B, and you consider it implicitly possible for an A to be a B with either no change or an implicit change... that seems to amount to considering As to be Bs when necessary.
> If you have A, something that accepts B, and you consider it implicitly possible for an A to be a B with either no change or an implicit change... that seems to amount to considering As to be Bs when necessary.
The fact that an A can be implicitly converted to a B in this context does not mean that an A should always be implicitly converted to B. (In this case B is effectively a pointer/reference to A; implicitly forming that reference is a useful convenience feature in some contexts, but I don't think treating A as a subtype of reference to A in the general case is a good idea)
But isn't it being implicit what makes it a subtype? It doesn't have to be a pointer; integer coercion should also be considered a form of subtyping.
I don't know what to make of context here; if you're referring to whether you have a pointer to a type as context, I think it makes more sense to consider that part of the type (i.e. a pointer a A is a subtype of a pointer to B, while A is not a subtype of B)
My point is that allowing an A to implicitly convert to a pointer to A when calling a function that takes a pointer to A is reasonable, whereas always allowing that implicit conversion anywhere in your program is rather less so. The same applies to integer conversion; there are contexts in which it should maybe happen implicitly, but not everywhere.
> But surely, you can still use subtyping in other cases -- when it is already unboxed -- right?
Well, maybe. But you have to actually have to do the legwork of figuring out which cases are subtypes and which aren't - unboxed was the first case that comes to mind, there might be others. Restating the same thing using symbols doesn't help make anything clearer, it just gets in the way.
> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
So you assert that numbers and strings are the same type because PHP and Javascript will implicitly convert back and forth? That any C++ implicit constructor creates a subtyping relationship?
> I don't think any language allows this
Rust, Swift, Zig, just off the top of my head. C# too, value types is where and why it first introduced explicit nullability (back in C# 2.0).
> because PHP and Javascript will implicitly convert back and forth
If you can provide (valid!) methods `T -> U` and `U -> T` for two types, why wouldn't `T = U` hold? (Atleast for types where `=` makes sense)
This is the definition I am using:
> S is a subtype of T, written S <: T, if a value of type S can safely be used in any context where a value of type T is expected.
from Pierce's "Software Foundations"
Of course, you may not want to make `String <: Number` and `Number <: String` (and thus `String = Number`), because there is no sensible way to do so; but this is an issue with the way PHP/JS handles subtyping, not with the notion of subtyping itself and certainly does not apply to the example of nullable types.
Unfortunately, I am not familiar enough with C++ to comment on the other question, sorry!
> The language can just have an implicit conversion rule for convenience
By "it is ok pass a String where we expect a String?", I think he means that no such program can go wrong, therefore this is sound. Whether you do this via an implicit conversion or a subtyping relation doesn't really matter.
I don't think subtyping and implicit coercions are really equivalent. What if your implicit coercions aren't transitive, or if different chains of possible implicit coercions give different results, or if implicit coercions change the in-memory representation of a value? In the last case, for example, reflectively examining the dynamic type will give different results.
They aren't equivalent, they're in different domains and abstraction levels. Subtypes are an abstract notion in type theory and grammar while coercions are implementation details/syntax.
> What if your implicit coercions aren't transitive
I think implicit coercions are a semantics issue, not an issue of implementation details.
An example of intransitive implicit coercions: perhaps you have designed a language such that there is an implicit coercion from Booleans to integers (perhaps producing either 0 or 1), and also an implicit coercion from nullable references to Booleans. Arguably you have already sinned, but the situation becomes far worse if you apply transitive closure to the implicit coercions, because now a nullable reference to the integer 53 can be implicitly coerced to an integer—but the integer is 1, not 53!
> I think implicit coercions are a semantics issue, not an issue of implementation details.
Why not both?
> An example of intransitive implicit coercions...
Ok? Arbitrary nullable references are obviously not a subtype of booleans no matter what you do. If anything may break by substituting one type for another, those types do not have that subtype relationship, by definition.
I didn't say they were equivalent, I said they were both sound in that context. By "doesn't really matter", I meant in the sense that I had just described, eg. that it "cannot go wrong".
In Swift, String? is an alias to Optional<String>. Optional is an enumeration with two cases: some, and none, where the former has the non-nil value attached. I‘m not sure how that ties in with subtyping though.
Maybe I'm off the mark here, but in my mind the definition of subtype is as follows: t1 is a subtype of t2 if any instance of t1 is an instance of t2. Surely with this definition String is a subtype of String?. Every instance of String (i.e. every string) is an instance of String?, after all.
Post-submit clarity edit: Ah, I think I understand. You're objecting to OP's definition of subtyping, as it breaks down for certain language implementations.
String <: String? is true (Any non-null string is a possibly null string), but String? <: String isn’t true (Any possibly null string may not be a non-null string).
If you just state it as a logical quantified proposition, it’s pretty obvious unless the type is complicated (then you need to break it down into multiple logical quantified propositions).
Implicit conversions from Foo? to Foo are a major footgun, because the point of returning a Foo? is to have the caller check for failure before assuming there's a Foo to use. Even C++ avoids the implicit conversion from optional<T> to T.
> existing programming languages have little or no subtyping [...] If you remember the fad for Object Oriented Programming
I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP). Therefore, most languages support subtyping via inheritance. Is that not true?
> I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP).
Whoa there - those are very different claims! Most modern languages are multi-paradigm. You can write Python in an OO way, or a data oriented way, or a functional way, or whatever you like. The same is true of JavaScript/typescript. And modern Java. And lots of languages.
Just because the language has the class keyword doesn’t mean everything can be subtyped. If you have an entity-component system, you can’t necessarily inherit from your Entity type. Likewise I can’t inherit from a react component, just because JavaScript has classes. (At least not using modern react)
> Most modern languages are multi-paradigm. [..] And modern Java.
Modern Java is still very definitely OOP and not multi-paradigm. Other JVM languages, like Scala or Kotlin, add more multi-paradigm features to that ecosystem, but modern Java doesn't.
Java has made some big strides here to catch up with the functional and data-oriented features other languages are all adding.
I was pretty surprised and impressed to see a modern example lately. There’s Record data types, stronger type inference with `var`, pretty great structural pattern matching, switch expressions (that return a value), stream apis for map, filter, and reduce etc. All of these move it further away from strictly OOP.
And separate but still cool is a pretty neat virtual thread system too. Java has definitely had a bit of a glow up since I learned it in uni.
I suspect that the stronger claim is true - but even the weaker claim would invalidate the article's statements that "existing programming languages have little or no subtyping" and that OOP was a "fad" and has been left in the past.
That does not address the fact that yes, inheritance is one type of subtyping polymorphism, and that many languages have them via OO paradigm.
Yes, there are other forms of subtyping, but saying that many languages don't have subtyping just because they don't have these other forms of subtyping is either cluelessness or satire.
Some languages have separate inheritance and subtyping relations, or have inheritance and no static typing, but yes, this statement makes it clear that the author is either astoundingly clueless, intentionally dishonest, or joking.
Algebraic subtyping and structured typing are not the same thing. Algebraic subtyping is a specific approach to type inference and checking of subtyping, which may or may not be used with structural types.
I hear what the author is saying, but I think if composability is your goal, you should look no further than traits/interfaces. Accomplishes the same goals but actually far more general (which, according to the author, is a good thing)
I think subtyping is the general concept in the programming language / type literature and interfaces / traits are an example of subtyping. If A implements interface B, A is a subtype of B.
> However, existing programming languages have little or no subtyping
Every "type" you use in Ada is actually a subtype.
In Ada, "A subtype of a given type is a combination of the type, a constraint on values of the type, and certain attributes specific to the subtype".
You don't ever refer to actual types in Ada, since creating a type with "type" actually creates an anonymous type and the name refers to the first subtype.[1]
type Token_Kind is (
Identifier,
String_Literal,
-- many more ...
-- These are character symbols and will be part of Token_Character_Symbol.
Ampersand,
-- ...
Vertical_Bar,
);
-- A subtype with a compile-time checked predicate.
subtype Subprogram_Kind is Token_Kind
with Static_Predicate => Subprogram_Kind in RW_Function | RW_Procedure;
subtype Token_Character_Symbol is Token_Kind range Ampersand .. Vertical_Bar;
> If you have a pointer with more permissions, it should still be usable in a place that only requires a pointer with fewer permissions
This is exactly how "accessibility" of access types works in Ada[2]. If you have a pointer ("access type") to something on the heap, you can use it wherever an anonymous access type can be used. You can also create subtypes of access types which have the constraint of a "null exclusion".
If you're familiar with Rust it's what's different about Box<T> compared to T, if you know Java, all their "objects" are boxed it's the difference between Integer and the primitive int type.
In C++ it's approximately std::unique_ptr<T> in a lot of the high level languages (indeed Java I mentioned above fits this if you're not familiar with the details) boxing is silently done for you and you are unaware of it unless you care about fine details.
Think of an arbitrary type Goose, where "is" the Goose? Assume for a moment there's data associated so the Goose does need to be stored somewhere, the two usual candidates are "the stack" and "the heap". If you've no idea what those are then I'm afraid you need a bit more CS to really engage with this conversation. Boxing puts things on the heap, typically with just a pointer to that box kept on the stack.
Not represented directly in memory in its raw form right where the value is placed, but rather stored somewhere else (usually the heap) / in an opaque manner and accessed via an indirection, such as a pointer / reference. Often involved in making things polymorphic, because different types usually have different representations in memory even if they implement the same interface, so they need to be boxed.
Boxing means transforming a variable of a value type into a variable of reference type.
//boxing
var a = 5; //int, value type
string b = (string)a; //string, reference type
Unboxing is the reverse process of transforming variables of reference types to value types.
A value type variable holds the value directly, while a reference type variable holds a reference that points to a memory address where the actual value is found.
This is true in some circumstances but not in general. Consider how integers are represented in Emacs Lisp, OCaml, or Smalltalk. They're unboxed, but boxing them like CPython does wouldn't change their semantics, so it doesn't seem correct to say that it would change their type. And there's no way in the language to box them. Fundamentally, boxing is a question about implementation, not semantics.
I think a more generally correct statement would be:
> An unboxed value can be held in CPU registers directly, while a boxed value is referenced through a memory address where the actual value is found.
Usually it's used for saying something is heap allocated.
In this case it doesn't seem very relevant, it looks like the article just assumes boxed records are dynamically or structurally typed, even though that's completely unrelated.
> Boxing is the process of converting a value type (on the stack, passed by value) to the type "object". When the runtime boxes a value type, it wraps the value inside a object instance and stores it on the managed heap (passed by reference). Unboxing extracts the value type from the object.
Dude seems to be coming from a strange place if he thinks that subtyping is uncommon in programming languages. Java, Kotlin, Scala, C++, Python, Go, and C# all have subtyping of one sort or another, and that covers at least 80% of the code being written today.
No, because all {x, y, z}s are {x, y}s, but the opposite is not true.
Said differently, any function that applies to the general {x, y} also applies to the more specific {x, y, z} (it will just not use the z part) but a function that requires a {x, y, z} specifically will not work with all {x, y}s.
In any case, it is clearly not following from his definitions of what a subtype is, since as soon as something depends on memory size the interopability is lost.
Not necessarily? Usually the only operation that depends on memory size is constructing a new object of a type.
With respect to memory size, operations on existing objects only depend on the memory size being at least big enough, other than maybe things like .clone().
Okay, I can pass {x:int,y:int,z:int} to a function of type {x:int,y:int}, so what? Where is the argument for ditching nominal typing in the first place?
You're confusing covariance itself with having covariant mutable collections. The former is a great feature, the latter a design mistake that makes the type system unsound.
Presumably tye issue lies with mutable collections.
If you have a type Animal, with subtypes Dog and Cat, then covariance means a list of Cat is a subtype of a list of Animal.
The problem is that appending a Dog is a legal action for a list of Animals, but not a legal action for a list of Cats.
To expand on this, you could be maintaining a list of Cats, call a function which accepts a list of Animals, and that function could append a Dog without causing a type error.
https://github.com/python/mypy/issues/4976 is an interesting discussion on this in Python: should a TypedDict be allowed where a dict argument is accepted? The basic consensus is no, because then one could modify the dict, adding keys that wouldn't be allowed in the specific TypedDict.
But this means that often you have to do cast(dict, my_typed_dict) to be able to interop with third party libraries that (quite naturally) believed that by typing their arguments as accepting a dict, they'd be accepting all dict-like things.
The solution is to encode mutability. So that you can say:
Const list<cat> is a subtype of Const list<animal>. But that does get complicated because. The difference between a mutable container of immutable elements and an immutable container of mutable elements is hard to get your head around, and verbose to capture in syntad.
Wait, if I recall correctly, covariance has long been established as a mistake. I would guess that if TypeScript had adopted contravariance the problems you mention would go away. We could then argue that TypeScript does need subtyping (like everyone else), as long as it’s contravariant.
My point is, we can’t say TypeScript doesn’t need subtyping just because they botched it. As such, it may not be a good counter example of the article’s thesis.
> Wait, if I recall correctly, covariance has long been established as a mistake.
Perhaps you're just missing some words here, but, just for clarity: it doesn't make any sense to say that covariance is a mistake. Covariance applied in specific places, like Java's mutable covariant arrays which leads to unsoundness, can be a mistake, but covariance itself in fine and essential in languages with subtyping. Function parameters should be covariant, function returns should be contravariant, mutable data structures should be invariant, etc.
> I'm not very familiar with these relations, but shouldn't function returns be covariant? `String => Cat` is a subtype of `String => Animal`?
You're right :) I mixed up covariance and contravariance for function parameters and return value in my comment.
> For function parameters, doesn't it depend on how the parameter is used?
I don't think so, but maybe there's specific circumstances I don't know of? Function parameter types is a constraint on _input_ to the function. Changing that to a subtype amounts to the function receiving arguments that satisfies a stronger constraint. That seems that something that would hold no matter how the parameter is used?
> > For function parameters, doesn't it depend on how the parameter is used?
> I don't think so, but maybe there's specific circumstances I don't know of?
I don't know specific circumstances either, but I presume they exist because of things like Dart's `covariant` keyword [0], which makes function parameters covariant instead of contravariant.
Copy(adest, asrc); // No! - How would Copy know how to copy `A` values when it only knows about `B`?
Copy(adest, bsrc); // No! - src argument is OK, but how can it downcast them to `A`?
Copy(adest, csrc); // No! - Same as above, and src elements must be at least `B`s`.
Copy(bdest, asrc); // Ok. - Any `A` in the src are interpreted as `B`s.
Copy(bdest, bsrc); // Ok, - all values are interpreted as `B`s.
Copy(bdest, csrc); // No! - Argument elements must be at least `B`s`.
Copy(cdest, asrc); // Ok - values are interpreted as `B`s in src, and as `C`s in dest.
Copy(cdest, bsrc); // Ok
Copy(cdest, csrc); // No! Argument elements must be at least `B`s`.
If the `dest` argument were contravariant, it would permit invalid copies and forbid valid ones.
Maybe this is intentionally introducing logical unsoundness into Dart's type system for pragmatic purposes, perhaps supplemented with some implicit dynamic checks?
However, outside of the function, when calling it, the opposite is true.
let result = foo (arg)
;; String <: arg
;; result <: Animal
It's easy to get them mixed up when looking at it from the wrong perspective - but we should be looking at functions from the outside, not the inside - so yes, parameters should be contravariant and return types covariant.
Others have pointed out that "covariance is a mistake" is nonsense, because, if you have subtyping, it's perfectly correct to make function types covariant on return types and contravariant on argument types. And it's useful.
To switch up the examples a bit, Natural is a subtype of Integer, so it's perfectly valid to make Natural => Natural (the type of IntegerSqrt) a subtype of Natural => Integer (the type of functions returning integers given a natural-number argument, such as n => (-2)**n), and to make Integer => Natural (the type of IntegerAbs) a subtype of Natural => Natural. If someone is expecting a Natural => Natural function, no surprises will result if you secretly smuggle them IntegerAbs. They just won't happen to call it with a negative argument.
And others have pointed out that covariance and contravariance are both logically unsound for mutable container types like arrays. In particular, covariance would allow you to infer Vector<Natural> is a subtype of Vector<Integer>, which, together with the function subtyping rules above, lets you pass a Vector<Natural> to a function like a => a[0] := -1. That will store a negative number into the Vector<Natural>, and down the line, that -1 will be incorrectly used in Natural calculations, possibly producing incorrect results. Contravariance is no better, because then functions like a => IntegerSqrt(a[0]) fall down go boom.
What I haven't seen anyone mention yet is that covariance is perfectly sound for immutable container types. If you have an immutable List container type supporting Car, Cdr, Cons, and IsEmpty functions, no surprises will result from passing a List<Natural> to a function expecting a List<Integer>. It can add -1 to the List with Cons(-1, xs) but that doesn't mutate the original List; it returns a new, longer List, which is already statically typed as a List<Integer>.
This is far from an original observation, so I was surprised not to see it mentioned.
Nothing obliges you to add function or immutable-container subtyping to your language just because you have subtyping of some kind. You could require implicit or explicit adaptors to be inserted in cases like the above, perhaps eliding those adaptors as a compilation optimization. Logically, that's a perfectly sound thing to do. I don't know why you would want to. It seems inconvenient. But maybe you do.
Correct solution: Contravariance for arguments, Covariance for return types.
Another way to put it, if I create a subtype, then the subtype's functions are allowed to be more general in what they accept, and more specific in what they return.
This implies that there is no subtype relation between X[Supertype] and X[Subtype].
> If a function accepts a value that can be String or null, it is ok to pass it a value that is guaranteed to be a String and not null. Or in other words, String is a subtype of String?!
That's... not true? The language can just have an implicit conversion rule for convenience.
> properties like nullability that don’t affect memory layout
That's not true either? It might not affect memory layout because all your objects are boxed or you have niche value optimisation you can use to tag nulls, but make an unboxed integer nullable and you'll definitely see your memory layout be affected.
> That's... not true? The language can just have an implicit conversion rule for convenience.
It is true regardless. What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system. Depending on your definition (reasonable people disagree and could be contextual), the memory layout is also rarely a concern when modeling type theory.
However, I can't make sense of the idea that "most languages don't have subtypes". Every language I can think of above ASM has at least some. In C, int8_t is a subtype of int16_t, for example.
And the idea that this is new territory... Here they contrast subtypes in C++ to that of "the new language Java".
https://www.cs.princeton.edu/courses/archive/fall98/cs441/ma...
I agree with the comment that OP really should make it explicit what they mean. It presents itself as "what is subtypes" but they unfortunately use the same term for different concepts in the same text, which I guess contributes to the wider confusion in this thread. Inserting a few "Algebraic ..." and making the context explicit in a place or two.
Or put differently: Algebraic subtypes are a subtype of subtypes.
> What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system.
You can handwave away these implementation details in a high-level language, but a thoughtfully-designed low-level language needs to consider whether or not any given language feature can feasibly be implemented without imposing an undesirable runtime cost.
This is also a pretty poor example for us to index on, because the idea of a function taking a `String?` (or `Option<String>` or whatever) is already so bizarre that's it naturally derails the discussion for want of a better example. It only makes sense to write such a function explicitly for users who do already have the nullable/optional type; if you have the non-nullable/non-optional version of the type, you should just be calling functions defined on that type.
> The language can just have an implicit conversion rule for convenience
the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
> make an unboxed integer nullable ...
I don't think any language allows this, in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"
I believe (unless I have misunderstood you) that both your examples are subsumed (heh) by subtyping
> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.
> I don't think any languages allow this
Plenty do, e.g. Rust's Option works that way.
> in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"
Saying the same thing with fancier words doesn't explain anything. The point is you can't simply treat non-nullable as a subtype of nullable in general, because this case exists.
> Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.
Does the actual data layout impact the observation?
If you have A, something that accepts B, and you consider it implicitly possible for an A to be a B with either no change or an implicit change... that seems to amount to considering As to be Bs when necessary.
> If you have A, something that accepts B, and you consider it implicitly possible for an A to be a B with either no change or an implicit change... that seems to amount to considering As to be Bs when necessary.
The fact that an A can be implicitly converted to a B in this context does not mean that an A should always be implicitly converted to B. (In this case B is effectively a pointer/reference to A; implicitly forming that reference is a useful convenience feature in some contexts, but I don't think treating A as a subtype of reference to A in the general case is a good idea)
But isn't it being implicit what makes it a subtype? It doesn't have to be a pointer; integer coercion should also be considered a form of subtyping.
I don't know what to make of context here; if you're referring to whether you have a pointer to a type as context, I think it makes more sense to consider that part of the type (i.e. a pointer a A is a subtype of a pointer to B, while A is not a subtype of B)
My point is that allowing an A to implicitly convert to a pointer to A when calling a function that takes a pointer to A is reasonable, whereas always allowing that implicit conversion anywhere in your program is rather less so. The same applies to integer conversion; there are contexts in which it should maybe happen implicitly, but not everywhere.
> The point is you can't simply treat non-nullable as a subtype of nullable in general, because this case exists.
But surely, you can still use subtyping in other cases -- when it is already unboxed -- right?
Like so: `T <: T?` for all boxed `T`.
> But surely, you can still use subtyping in other cases -- when it is already unboxed -- right?
Well, maybe. But you have to actually have to do the legwork of figuring out which cases are subtypes and which aren't - unboxed was the first case that comes to mind, there might be others. Restating the same thing using symbols doesn't help make anything clearer, it just gets in the way.
> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation
So you assert that numbers and strings are the same type because PHP and Javascript will implicitly convert back and forth? That any C++ implicit constructor creates a subtyping relationship?
> I don't think any language allows this
Rust, Swift, Zig, just off the top of my head. C# too, value types is where and why it first introduced explicit nullability (back in C# 2.0).
> because PHP and Javascript will implicitly convert back and forth
If you can provide (valid!) methods `T -> U` and `U -> T` for two types, why wouldn't `T = U` hold? (Atleast for types where `=` makes sense)
This is the definition I am using:
> S is a subtype of T, written S <: T, if a value of type S can safely be used in any context where a value of type T is expected.
from Pierce's "Software Foundations"
Of course, you may not want to make `String <: Number` and `Number <: String` (and thus `String = Number`), because there is no sensible way to do so; but this is an issue with the way PHP/JS handles subtyping, not with the notion of subtyping itself and certainly does not apply to the example of nullable types.
Unfortunately, I am not familiar enough with C++ to comment on the other question, sorry!
Yes, numbers and strings are the same type in Javascript and PHP. And yes, a C++ constructor declares a subtype relation.
That's the semantics of those languages.
Yeah I think this is "Deref coercion" in Rust.[0]
For example since Vec can Deref to Slice, you can syntactically run Slice methods on a Vec; meaning Vec is a subtype of Slice.
[0] https://doc.rust-lang.org/std/ops/trait.Deref.html#deref-coe...
As I revisit, this also comes up more explicitly when dealing with lifetimes as a type, and stacked borrows.
[0] https://doc.rust-lang.org/nomicon/subtyping.html
Lifetimes are the only subtyping relationship in Rust, yes.
> The language can just have an implicit conversion rule for convenience
By "it is ok pass a String where we expect a String?", I think he means that no such program can go wrong, therefore this is sound. Whether you do this via an implicit conversion or a subtyping relation doesn't really matter.
I don't think subtyping and implicit coercions are really equivalent. What if your implicit coercions aren't transitive, or if different chains of possible implicit coercions give different results, or if implicit coercions change the in-memory representation of a value? In the last case, for example, reflectively examining the dynamic type will give different results.
They aren't equivalent, they're in different domains and abstraction levels. Subtypes are an abstract notion in type theory and grammar while coercions are implementation details/syntax.
> What if your implicit coercions aren't transitive
Have an example in mind?
I think implicit coercions are a semantics issue, not an issue of implementation details.
An example of intransitive implicit coercions: perhaps you have designed a language such that there is an implicit coercion from Booleans to integers (perhaps producing either 0 or 1), and also an implicit coercion from nullable references to Booleans. Arguably you have already sinned, but the situation becomes far worse if you apply transitive closure to the implicit coercions, because now a nullable reference to the integer 53 can be implicitly coerced to an integer—but the integer is 1, not 53!
> I think implicit coercions are a semantics issue, not an issue of implementation details.
Why not both?
> An example of intransitive implicit coercions...
Ok? Arbitrary nullable references are obviously not a subtype of booleans no matter what you do. If anything may break by substituting one type for another, those types do not have that subtype relationship, by definition.
It seems like you aren't thinking very clearly.
I didn't say they were equivalent, I said they were both sound in that context. By "doesn't really matter", I meant in the sense that I had just described, eg. that it "cannot go wrong".
I agree, if we're talking about Milner's sense (he means roughly "can't produce a runtime error").
In Swift, String? is an alias to Optional<String>. Optional is an enumeration with two cases: some, and none, where the former has the non-nil value attached. I‘m not sure how that ties in with subtyping though.
I think that whole section of the post was "For example, in Kotlin," presumably these are true in that language.
Maybe I'm off the mark here, but in my mind the definition of subtype is as follows: t1 is a subtype of t2 if any instance of t1 is an instance of t2. Surely with this definition String is a subtype of String?. Every instance of String (i.e. every string) is an instance of String?, after all.
Post-submit clarity edit: Ah, I think I understand. You're objecting to OP's definition of subtyping, as it breaks down for certain language implementations.
String <: String? is true (Any non-null string is a possibly null string), but String? <: String isn’t true (Any possibly null string may not be a non-null string).
We're all in agreement there it seems.
If you just state it as a logical quantified proposition, it’s pretty obvious unless the type is complicated (then you need to break it down into multiple logical quantified propositions).
Yes of course, but again, I don't think anyone disagrees with you on this.
Implicit conversions from Foo? to Foo are a major footgun, because the point of returning a Foo? is to have the caller check for failure before assuming there's a Foo to use. Even C++ avoids the implicit conversion from optional<T> to T.
Does C++ support implicit coercion from T to optional<T>? I think it doesn't, but that would be the corresponding case.
Yes, but they were describing an implicit conversion going the other way around, eg. String <: String?.
> existing programming languages have little or no subtyping [...] If you remember the fad for Object Oriented Programming
I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP). Therefore, most languages support subtyping via inheritance. Is that not true?
> I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP).
Whoa there - those are very different claims! Most modern languages are multi-paradigm. You can write Python in an OO way, or a data oriented way, or a functional way, or whatever you like. The same is true of JavaScript/typescript. And modern Java. And lots of languages.
Just because the language has the class keyword doesn’t mean everything can be subtyped. If you have an entity-component system, you can’t necessarily inherit from your Entity type. Likewise I can’t inherit from a react component, just because JavaScript has classes. (At least not using modern react)
> Most modern languages are multi-paradigm. [..] And modern Java.
Modern Java is still very definitely OOP and not multi-paradigm. Other JVM languages, like Scala or Kotlin, add more multi-paradigm features to that ecosystem, but modern Java doesn't.
Java has made some big strides here to catch up with the functional and data-oriented features other languages are all adding.
I was pretty surprised and impressed to see a modern example lately. There’s Record data types, stronger type inference with `var`, pretty great structural pattern matching, switch expressions (that return a value), stream apis for map, filter, and reduce etc. All of these move it further away from strictly OOP.
And separate but still cool is a pretty neat virtual thread system too. Java has definitely had a bit of a glow up since I learned it in uni.
It tries to be OOP, but you can still implement a program with int, int[] and static methods.
And you can’t subtype those.
Other JVM languages, like scala, are more OO than Java.
> Whoa there - those are very different claims!
I suspect that the stronger claim is true - but even the weaker claim would invalidate the article's statements that "existing programming languages have little or no subtyping" and that OOP was a "fad" and has been left in the past.
Sate your curiosity and read the next 21 words following your quote.
> , “subtyping” might call to mind classes and inheritance hierarchies. However, subtyping is a far more basic and general notion than that.
That does not address the fact that yes, inheritance is one type of subtyping polymorphism, and that many languages have them via OO paradigm.
Yes, there are other forms of subtyping, but saying that many languages don't have subtyping just because they don't have these other forms of subtyping is either cluelessness or satire.
You're absolutely correct, just look at the top languages and what's idiomatic in them
Some languages have separate inheritance and subtyping relations, or have inheritance and no static typing, but yes, this statement makes it clear that the author is either astoundingly clueless, intentionally dishonest, or joking.
Seems like "Why You Need Algebraic Subtyping" might be a better title for this post.
I think more common name for this is structured typing, so it's not confused with subtyping which is a kind of polymorphism.
Algebraic subtyping and structured typing are not the same thing. Algebraic subtyping is a specific approach to type inference and checking of subtyping, which may or may not be used with structural types.
I hear what the author is saying, but I think if composability is your goal, you should look no further than traits/interfaces. Accomplishes the same goals but actually far more general (which, according to the author, is a good thing)
Trait and interface systems use a subtyping relation; they aren't alternatives to a subtyping relation.
I think subtyping is the general concept in the programming language / type literature and interfaces / traits are an example of subtyping. If A implements interface B, A is a subtype of B.
thanks for clarifying that!
> However, existing programming languages have little or no subtyping
Every "type" you use in Ada is actually a subtype.
In Ada, "A subtype of a given type is a combination of the type, a constraint on values of the type, and certain attributes specific to the subtype".
You don't ever refer to actual types in Ada, since creating a type with "type" actually creates an anonymous type and the name refers to the first subtype.[1]
> If you have a pointer with more permissions, it should still be usable in a place that only requires a pointer with fewer permissionsThis is exactly how "accessibility" of access types works in Ada[2]. If you have a pointer ("access type") to something on the heap, you can use it wherever an anonymous access type can be used. You can also create subtypes of access types which have the constraint of a "null exclusion".
[1]: https://ada-lang.io/docs/arm/AA-3/AA-3.2#p1_3.2.1
[2]: https://ada-lang.io/docs/arm/AA-3/AA-3.10
What is the meaning of the word “boxed” in this article? Bit confused about it.
If you're familiar with Rust it's what's different about Box<T> compared to T, if you know Java, all their "objects" are boxed it's the difference between Integer and the primitive int type.
In C++ it's approximately std::unique_ptr<T> in a lot of the high level languages (indeed Java I mentioned above fits this if you're not familiar with the details) boxing is silently done for you and you are unaware of it unless you care about fine details.
Think of an arbitrary type Goose, where "is" the Goose? Assume for a moment there's data associated so the Goose does need to be stored somewhere, the two usual candidates are "the stack" and "the heap". If you've no idea what those are then I'm afraid you need a bit more CS to really engage with this conversation. Boxing puts things on the heap, typically with just a pointer to that box kept on the stack.
Not represented directly in memory in its raw form right where the value is placed, but rather stored somewhere else (usually the heap) / in an opaque manner and accessed via an indirection, such as a pointer / reference. Often involved in making things polymorphic, because different types usually have different representations in memory even if they implement the same interface, so they need to be boxed.
Boxing means transforming a variable of a value type into a variable of reference type.
//boxing var a = 5; //int, value type
string b = (string)a; //string, reference type
Unboxing is the reverse process of transforming variables of reference types to value types.
A value type variable holds the value directly, while a reference type variable holds a reference that points to a memory address where the actual value is found.
This is true in some circumstances but not in general. Consider how integers are represented in Emacs Lisp, OCaml, or Smalltalk. They're unboxed, but boxing them like CPython does wouldn't change their semantics, so it doesn't seem correct to say that it would change their type. And there's no way in the language to box them. Fundamentally, boxing is a question about implementation, not semantics.
I think a more generally correct statement would be:
> An unboxed value can be held in CPU registers directly, while a boxed value is referenced through a memory address where the actual value is found.
Usually it's used for saying something is heap allocated.
In this case it doesn't seem very relevant, it looks like the article just assumes boxed records are dynamically or structurally typed, even though that's completely unrelated.
> Boxing is the process of converting a value type (on the stack, passed by value) to the type "object". When the runtime boxes a value type, it wraps the value inside a object instance and stores it on the managed heap (passed by reference). Unboxing extracts the value type from the object.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-...
Subtyping is one form of type polymorphism. Others include:
- parametric polymorphism (generics, templates, ...)
- ad hoc polymorphism (type classes are the nicest example)
- https://en.wikipedia.org/wiki/Row_polymorphism I think this one is the most relevant for the record discussion.
After reading the article, I propose this gist / title:
"In high level programmming languages, you expect subtyping as a given",
since we expect a lot of features to work as if subtyping was the way it was implemented?
This would be a lot more convincing if served with an example of a language where subtyping doesn't lead to more rough edges.
Dude seems to be coming from a strange place if he thinks that subtyping is uncommon in programming languages. Java, Kotlin, Scala, C++, Python, Go, and C# all have subtyping of one sort or another, and that covers at least 80% of the code being written today.
So is that basically duck typing?
The example using an open record is an implementation of safe duck typing
Checking type unions and inferring information about conditional branching isn't really duck typing
> In this case, {x: int, y: int, z: int} is a subtype of {x: int, y: int}
It seems like the former should be a supertype?
No, because all {x, y, z}s are {x, y}s, but the opposite is not true.
Said differently, any function that applies to the general {x, y} also applies to the more specific {x, y, z} (it will just not use the z part) but a function that requires a {x, y, z} specifically will not work with all {x, y}s.
In any case, it is clearly not following from his definitions of what a subtype is, since as soon as something depends on memory size the interopability is lost.
Not necessarily? Usually the only operation that depends on memory size is constructing a new object of a type. With respect to memory size, operations on existing objects only depend on the memory size being at least big enough, other than maybe things like .clone().
General -> specific
Animal -> Dog
(and a Dog will have more fields than an animal)
Ah, so. That helps. Thank you.
Okay, I can pass {x:int,y:int,z:int} to a function of type {x:int,y:int}, so what? Where is the argument for ditching nominal typing in the first place?
[dead]
[dead]
[flagged]
Can you please not post like this to HN, regardless of how wrong someone is or you feel they are? We're trying for something else here.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
I disagree with this article. Subtyping is what we have in TypeScript and it creates a lot of problems due to covariance.
You're confusing covariance itself with having covariant mutable collections. The former is a great feature, the latter a design mistake that makes the type system unsound.
Can you give some examples? How does this cause problems in typescript?
Presumably tye issue lies with mutable collections.
If you have a type Animal, with subtypes Dog and Cat, then covariance means a list of Cat is a subtype of a list of Animal. The problem is that appending a Dog is a legal action for a list of Animals, but not a legal action for a list of Cats.
To expand on this, you could be maintaining a list of Cats, call a function which accepts a list of Animals, and that function could append a Dog without causing a type error.
https://github.com/python/mypy/issues/4976 is an interesting discussion on this in Python: should a TypedDict be allowed where a dict argument is accepted? The basic consensus is no, because then one could modify the dict, adding keys that wouldn't be allowed in the specific TypedDict.
But this means that often you have to do cast(dict, my_typed_dict) to be able to interop with third party libraries that (quite naturally) believed that by typing their arguments as accepting a dict, they'd be accepting all dict-like things.
The solution is to encode mutability. So that you can say:
Const list<cat> is a subtype of Const list<animal>. But that does get complicated because. The difference between a mutable container of immutable elements and an immutable container of mutable elements is hard to get your head around, and verbose to capture in syntad.
Wait, if I recall correctly, covariance has long been established as a mistake. I would guess that if TypeScript had adopted contravariance the problems you mention would go away. We could then argue that TypeScript does need subtyping (like everyone else), as long as it’s contravariant.
My point is, we can’t say TypeScript doesn’t need subtyping just because they botched it. As such, it may not be a good counter example of the article’s thesis.
> Wait, if I recall correctly, covariance has long been established as a mistake.
Perhaps you're just missing some words here, but, just for clarity: it doesn't make any sense to say that covariance is a mistake. Covariance applied in specific places, like Java's mutable covariant arrays which leads to unsoundness, can be a mistake, but covariance itself in fine and essential in languages with subtyping. Function parameters should be covariant, function returns should be contravariant, mutable data structures should be invariant, etc.
> function returns should be contravariant
I'm not very familiar with these relations, but shouldn't function returns be covariant? `String => Cat` is a subtype of `String => Animal`?
> Function parameters should be covariant
For function parameters, doesn't it depend on how the parameter is used?
> I'm not very familiar with these relations, but shouldn't function returns be covariant? `String => Cat` is a subtype of `String => Animal`?
You're right :) I mixed up covariance and contravariance for function parameters and return value in my comment.
> For function parameters, doesn't it depend on how the parameter is used?
I don't think so, but maybe there's specific circumstances I don't know of? Function parameter types is a constraint on _input_ to the function. Changing that to a subtype amounts to the function receiving arguments that satisfies a stronger constraint. That seems that something that would hold no matter how the parameter is used?
> > For function parameters, doesn't it depend on how the parameter is used?
> I don't think so, but maybe there's specific circumstances I don't know of?
I don't know specific circumstances either, but I presume they exist because of things like Dart's `covariant` keyword [0], which makes function parameters covariant instead of contravariant.
[0] https://dart.dev/language/type-system#covariant-keyword
The use case would be where the arguments are expected to be filled in by the function - ie, as additional return types.
Consider if we have `A <= B <= C`, and a function:
Given some potential inputs: The following should be true: If the `dest` argument were contravariant, it would permit invalid copies and forbid valid ones. In a purely functional setting, you should not need a covariant parameter type because all writes would end up in the return type.Maybe this is intentionally introducing logical unsoundness into Dart's type system for pragmatic purposes, perhaps supplemented with some implicit dynamic checks?
Read-only parameters should be contravariant, write-only parameters should be covariant, and read-write parameters should be invariant.
`String => Cat` and `String => Animal` are the types of the functions themselves - not the arg and return types.
If we consider inside the function:
However, outside of the function, when calling it, the opposite is true. It's easy to get them mixed up when looking at it from the wrong perspective - but we should be looking at functions from the outside, not the inside - so yes, parameters should be contravariant and return types covariant.> parameters should be contravariant
Always? In that case, do you know why Dart has a `covariant` keyword [0], which makes function parameters covariant?
[0] https://dart.dev/language/type-system#covariant-keyword
Others have pointed out that "covariance is a mistake" is nonsense, because, if you have subtyping, it's perfectly correct to make function types covariant on return types and contravariant on argument types. And it's useful.
To switch up the examples a bit, Natural is a subtype of Integer, so it's perfectly valid to make Natural => Natural (the type of IntegerSqrt) a subtype of Natural => Integer (the type of functions returning integers given a natural-number argument, such as n => (-2)**n), and to make Integer => Natural (the type of IntegerAbs) a subtype of Natural => Natural. If someone is expecting a Natural => Natural function, no surprises will result if you secretly smuggle them IntegerAbs. They just won't happen to call it with a negative argument.
And others have pointed out that covariance and contravariance are both logically unsound for mutable container types like arrays. In particular, covariance would allow you to infer Vector<Natural> is a subtype of Vector<Integer>, which, together with the function subtyping rules above, lets you pass a Vector<Natural> to a function like a => a[0] := -1. That will store a negative number into the Vector<Natural>, and down the line, that -1 will be incorrectly used in Natural calculations, possibly producing incorrect results. Contravariance is no better, because then functions like a => IntegerSqrt(a[0]) fall down go boom.
What I haven't seen anyone mention yet is that covariance is perfectly sound for immutable container types. If you have an immutable List container type supporting Car, Cdr, Cons, and IsEmpty functions, no surprises will result from passing a List<Natural> to a function expecting a List<Integer>. It can add -1 to the List with Cons(-1, xs) but that doesn't mutate the original List; it returns a new, longer List, which is already statically typed as a List<Integer>.
This is far from an original observation, so I was surprised not to see it mentioned.
Nothing obliges you to add function or immutable-container subtyping to your language just because you have subtyping of some kind. You could require implicit or explicit adaptors to be inserted in cases like the above, perhaps eliding those adaptors as a compilation optimization. Logically, that's a perfectly sound thing to do. I don't know why you would want to. It seems inconvenient. But maybe you do.
Correct solution: Contravariance for arguments, Covariance for return types.
Another way to put it, if I create a subtype, then the subtype's functions are allowed to be more general in what they accept, and more specific in what they return.
This implies that there is no subtype relation between X[Supertype] and X[Subtype].