Rendered at 12:44:51 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
hutao 4 days ago [-]
Note that the division-by-zero example used in this article is not the best example to demonstrate "Parse, Don't Validate," because it relies on encapsulation. The principle of "Parse, Don't Validate" is best embodied by functions that transform untrusted data into some data type which is correct by construction.
Alexis King, the author of the original "Parse, Don't Validate" article, also published a follow-up, "Names are not type safety" [0] clarifying that the "newtype" pattern (such as hiding a nonzero integer in a wrapper type) provide weaker guarantees than correctness by construction. Her original "Parse, Don't Validate" article also includes the following caveat:
> Use abstract datatypes to make validators “look like” parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to “fake” a parser from a validator.
So, an abstract data type that protects its inner data is really a "validator" that tries to resemble a "parser" in cases where the type system itself cannot encode the invariant.
The article's second example, the non-empty vec, is a better example, because it encodes within the type system the invariant that one element must exist. The crux of Alexis King's article is that programs should be structured so that functions return data types designed to be correct by construction, akin to a parser transforming less-structured data into more-structured data.
Even the newtype-based "parse, don't validate" is tremendously useful in practice, though. The big thing is that if you have a bare string, you don't know "where it's been". It doesn't carry with it information whether it's already been validated. Even if a newtype can't provide you full correctness by construction, it's vastly easier to be convinced of the validity of an encapsulated value compared to a naked one.
For full-on parse-don't-validate, you essentially need a dependent type system. As a more light-weight partial solution, Rust has been prototyping pattern types, which are types constrained by patterns. For instance a range-restricted integer type could be simply spelled `i8 is 0..100`, or a nonempty slice as `[T] is [_, ..]`. Such a feature would certainly make correctness-by-construction easier in many cases.
The non-empty list implemented as a (T, Vec<T>) is, btw, a nice example of the clash between practicality and theoretical purity. It can't offer you a slice (consecutive view) of its elements without storing the first element twice (which requires immutability and that T: Clone, unlike normal Vec<T>), which makes it fairly useless as a vector. It's okay if you consider it just an abstract list with a more restricted interface.
rhdunn 3 days ago [-]
It's also useful to wrap/tag IDs in structured types. That makes it easier to avoid errors when there are multiple type parameters such as in the Microsoft graph API.
humkieufj 3 days ago [-]
[flagged]
spockz 3 days ago [-]
Coming from Haskell, I loved Agda 2 as a dependent type language. Is there any newer or more mainstream language that has added dependent types?
That whole blog is one big and fascinating rabbit hole into type theory! Thanks for the link!
doctor_phil 3 days ago [-]
Idris is slightly more mainstream I would say, but not wildy so. If you like the Haskell interop then I'd recommend staying with Agda.
Scala 3 is much more mainstream and has path dependent types. I've only used Scala 2, and there the boilerplate for dependent types was frustrating imo, but I've heard its better in 3.
spockz 3 days ago [-]
Ah yes Idris rings a bell. I’ll try that one again.
Scala 3 indeed is more mainstream but it seems also on the way out. At least here in corporate world it is replaced by Kotlin and Java 21+ for a large part.
instig007 3 days ago [-]
You can have range-constrained numeric types and collections in Haskell via Liquid Haskell, which has almost seamless integration with the compiler starting from GHC-9.12+
yuppiemephisto 1 days ago [-]
lean 4
3 days ago [-]
rapnie 4 days ago [-]
You can also search for "make invalid states impossible/unrepresentable" [0] to find more info on related practices. See "domain modeling made functional" [0] as a nice example
The phrasing that I hear more often is "make illegal states unrepresentable"; both the submitted article and Alexis King's original article use this phrase. At least according to https://fsharpforfunandprofit.com/posts/designing-with-types..., it originates from Yaron Minsky (a programmer at Jane Street who is prominent in the OCaml community).
EDIT: Parent comment was edited to amend the "impossible/unrepresentable" wording
rapnie 4 days ago [-]
Yes, sorry. I thought to add some resources to it, or it would be a too vague comment and found the better phrasing.
CodeBit26 3 days ago [-]
I agree, 'correct by construction' is the ultimate goal here. Using types like NonZeroU32 is a great simple example, but the real power comes when you design your entire domain logic so that the compiler acts as your gatekeeper. It shifts the mental load from run-time debugging to design-time thinking.
fspeech 3 days ago [-]
Yes division is a poor example. It's a poor separation of concerns to try to wrap at this level without usage context. To see the point try to wrap overflows on arithmetic functions.
kibwen 3 days ago [-]
> To see the point try to wrap overflows on arithmetic functions.
The invariant is less obvious, but you could still do this if you really wanted. The observation is that addition of two fixed-size nonnegative integers cannot overflow unless at least one is greater than half the range. So your `non_overflowing_addition` function would need take two inputs of type `NonNegativeLessThanHalfMaxValue`, where the constructor for the latter type enforces the invariant. Multiplication is similar, but with the square root of the range (and I suppose `NonNegativeLessThanSqrtMaxValue` could be a subtype of `NonNegativeLessThanHalfMaxValue` if you want to be fancy).
hutao 3 days ago [-]
Note that addition also won't overflow if one addend is greater than half the range, but the other addend is still small enough (e.g. for the range -128 to 127, adding 65 + 12 will not overflow, even though 65 is greater than half of 127).
Your intention of having the restricted domain "NonNegativeLessThanHalfMaxValue" is probably so that both addends have the same domain. If you go down that route, perhaps you'll also want the closure property, meaning that the range should also belong to the same set. However, then you have to deal with the overflow problem all over again...
The real point is that when adding two N-bit integers, the range must be N+1 bits, because the "carry bit" is also part of the output. I think this is a scenario where "Parse, Don't Validate" can't easily help, because the validity of the addition is intrinsically a function of both inputs together.
(p.s. these links are just to satisfy extra-curious readers - no criticism is intended! I add this because people sometimes assume otherwise)
hacker_homie 4 days ago [-]
[flagged]
jaggederest 4 days ago [-]
You can go even further with this in other languages, with things like dependent typing - which can assert (among other interesting properties) that, for example, something like
get_elem_at_index(array, index)
cannot ever have index outside the bounds of the array, but checked statically at compilation time - and this is the key, without knowing a priori what the length of array is.
"In Idris, a length-indexed vector is Vect n a (length n is in the type), and a valid index into length n is Fin n ('a natural number strictly less than n')."
Similar tricks work with division that might result in inf/-inf, to prevent them from typechecking, and more subtle implications in e.g. higher order types and functions
Very cool and practical, but specs aren't dependent typing. (I actually think specs are probably more useful than dependent types for most people)
Dependent typing requires:
- Generic types that can take runtime values as parameters, e.g. [u8; user_input]
- Functions where the type of one parameter depends on the runtime value of another parameter, e.g. fn f(len: usize, data: [u8; len])
- Structs/tuples where the type of one field depends on the runtime value of another field, e.g. struct Vec { len: usize, data: [u8; self.len] }
satvikpendem 3 days ago [-]
Sadly I'm not sure Rust will ever get those sorts of features.
hmry 3 days ago [-]
They've gone the Haskell route of adding a billion features to a non-dependent type system instead.
Not that I blame them, nobody has figured out practical dependent typing yet. (Idris is making good progress in the field though)
VorpalWay 4 days ago [-]
How does that work? If the length of the array is read from stdin for example, it would be impossible to know it at compile time. Presumably this is limited somehow?
marcosdumay 4 days ago [-]
If you check that the value is inside the range, and execute some different code if it's not, then congratulations, you now know at compile time that the number you will read from stdin is in the right range.
mdm12 4 days ago [-]
One option is dependent pairs, where one value of the pair (in this example) would be the length of the array and the other value is a type which depends on that same value (such as Vector n T instead of List T).
Type-Driven Development with Idris[1] is a great introduction for dependently typed languages and covers methods such as these if you're interested (and Edwin Brady is a great teacher).
Not sure about Idris, but in Lean `Fin n` is a struct that contains a value `i` and a proof that `i < n`. You can read in the value `n` from stdin and then you can do `if h : i < n` to have a compile-time proof `h` that you can use to construct a `Fin n` instance.
smj-edison 3 days ago [-]
I've heard this can be a bit of a pain in practice, is that true? I can imagine it could slow me down to construct a proof of an invariant every time I call a function (if I understand correctly).
jaggederest 3 days ago [-]
I haven't worked significantly with lean but I'm toying with my own dependently typed language. generally you only have to construct a proof once, much like a type or function, and then you reuse it. Also, depending on the language there are rewriting semantics ("elaboration") that let you do mathematical transformations to make two statements equivalent and then reuse the standardized proof.
ratorx 4 days ago [-]
It doesn’t have to be a compile time constant. An alternative is to prove that when you are calling the function the index is always less than the size of the vector (a dynamic constraint). You may be able to assert this by having a separate function on the vector that returns a constrained value (eg. n < v.len()).
jaggederest 4 days ago [-]
If the length is read from outside the program it's an IO operation, not a static variable, but there are generally runtime checks in addition to the type system. Usually you solve this as in the article, with a constructor that checks it - so you'd have something like "Invalid option: length = 5 must be within 0-4" when you tried to create the Fin n from the passed in value
rq1 4 days ago [-]
Imagine you read a value from stdin and parse it as:
Maybe Int
So your program splits into two branches:
1. Nothing branch: you failed to obtain an Int.
There is no integer to use as an index, so you can’t even attempt a safe lookup into something like Vect n a.
2. Just i branch: you do have an Int called i.
But an Int is not automatically a valid index for Vect n a, because vectors are indexed by Fin n (a proof carrying “bounded natural”).
So inside the Just i branch, you refine further:
3. Try to turn the runtime integer i into a value of type Fin n.
There are two typical shapes of this step:
* Checked conversion returning Maybe (Fin n)
If the integer is in range, you get Just (fin : Fin n). Otherwise Nothing.
Checked conversion returning evidence (proof) that it’s in range
For example: produce k : Nat plus a proof like k < n (or LTE (S k) n), and then you can construct Fin n from that evidence.
(But it’s the same basically, you end up with a “Maybe LTE…”
Now if you also have a vector:
xs : Vect n a
… the n in Fin n and the n in Vect n a are the same n (that’s what “unifies” means here: the types line up), so you can do:
index fin xs : a
And crucially:
there is no branch in which you can call index without having constructed the Fin n first,
so out-of-bounds access is unrepresentable (it’s not “checked later”, it’s “cannot be expressed”).
And within _that_ branch of the program, you have a proof of Fin n.
Said differently: you don’t get “compile-time knowledge of i”; you get a compile-time guarantee that whatever value you ended up with satisfies a predicate.
Concretely: you run a runtime check i < n. _ONCE_
If it fails, you’re in a branch where you do not have Fin n.
If it succeeds, you construct fin : Fin n at runtime (it’s a value, you can’t get around that), but its type encodes the invariant “in bounds”/check was done somewhere in this branch.
esafak 4 days ago [-]
I wish dependent types were more common :(
unixpickle 4 days ago [-]
The `try_roots` example here is actually a _counterexample_ to the author's main argument. They explicitly ignore the "negative discriminant" case. What happens if we consider it?
If we take their "parse" approach, then the types of the arguments a, b, and c have to somehow encode the constraint `b^2 - 4ac >= 0`. This would be a total mess--I can't think of any clean way to do this in Rust. It makes _much_ more sense to simply return an Option and do the validation within the function.
In general, I think validation is often the best way to solve the problem. The only counterexample, which the author fixates on in the post, is when one particular value is constrained in a clean, statically verifiable way. Most of the time, validation is used to check (possibly complex) interactions between multiple values, and "parsing" isn't at all convenient.
sbszllr 3 days ago [-]
I was thinking a similar thing when reading the article. Often, the validity of the input depends on the interaction between some of them.
Sure, we can follow the advice of creating types that represent only valid states but then we end up with `fn(a: A, b: B, c: C) transformed into `fn(abc: ValidABC)`
strawhatguy 4 days ago [-]
The alternative is one type, with many functions that can operate on that type.
Like how clojure basically uses maps everywhere and the whole standard library allows you to manipulate them in various ways.
The main problem with the many type approach is several same it worse similar types, all incompatible.
fiddlerwoaroof 4 days ago [-]
Yeah, there's something of a tension between the Perlis quote "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures" and Parse, don't validate.
The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.
eddd-ddde 13 hours ago [-]
Do those design practices protect you when you apply a refactor and now you don't know which call sites may be broken now?
strawhatguy 4 days ago [-]
There's probably a case for both. Core logic might benefit from hard types deep in the bowels of unchanging engine.
The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.
fiddlerwoaroof 3 days ago [-]
Coalton ( https://coalton-lang.github.io ) is the sort of thing I like: a Haskell-style language hosted inside a very dynamic one with good interop.
strawhatguy 3 days ago [-]
Yes it's quite the blend!
packetlost 4 days ago [-]
I don't really get why this is getting flagged, I've found this to be true but more of a trade off than a pure benefit. It also is sort of besides the point: you always need to parse inputs from external, usually untrusted, sources.
doublesocket 4 days ago [-]
Agree with this. Mismatching types are generally an indicator of an underlying issue with the code, not the language itself. These are areas AI can be helpful flagging potential problems.
Rygian 4 days ago [-]
This sounds like the "stringly typed language" mockery of some languages. How is it actually different?
4 days ago [-]
Kinrany 4 days ago [-]
It's not an alternative.
Start with a more dynamic type, do stuff that doesn't care about the shape, parse into a more precise type, do stuff that relies on the additional invariants, drop back into the more dynamic type again.
slopinthebag 4 days ago [-]
I find a balance is important. You can do nominal typing in a structural type system with branding, and you can kinda do structural typing in a nominal type system, but it's not as ergonomic. But you should probably end up doing a mix of both.
marcosdumay 4 days ago [-]
There are more than two alternatives, since functions can operate in more than one type.
noitpmeder 4 days ago [-]
This reminds me a bit of a recent publication by Stroustrup about using concepts... in C++ to validate integer conversions automatically where necessary.
{
Number<unsigned int> ii = 0;
Number<char> cc = '0';
ii = 2; // OK
ii = -2; // throws
cc = i; // OK if i is within cc’s range
cc = -17; // OK if char is signed; otherwise throws
cc = 1234; // throws if a char is 8 bits
}
fph 4 days ago [-]
The article quickly mentions implementing addition:
```
impl Add for NonZeroF32 { ... }
impl Add<f32> for NonZeroF32 { ... }
impl Add<NonZeroF32> for f32 { ... }
```
What type would it return though?
alfons_foobar 4 days ago [-]
Would have to be F32, no?
I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
> I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
`NonZeroU32::checked_add(self, other: u32)` basically does this, although I'll note it returns an `Option` instead of a `Result` ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... ), leaving you to `.map_err(...)` or otherwise handle the edge case to your heart's content. Niche, but occasionally what you want.
alfons_foobar 4 days ago [-]
> `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though!
I was confused at first how that could work, but then I realized that of course, with _unsigned_ integers this works fine because you cannot add a negative number...
fph 3 days ago [-]
You'd still have to check for overflow, I imagine.
And there are other gotchas, for instance it seems natural to assume that NonZeroF32 * NonZeroF32 can return a NonZeroF32, but 1e-25 * 1e-25 = 0 because of underflow.
WJW 3 days ago [-]
I imagine it would be something like Option<NonZeroF32>, since -2.0 + 2.0 would violate the constraints at runtime. This gets us the Option handling problem back.
I think the article would have been better with NonZeroPositiveF32 as the example type, since then addition would be safe.
AxiomLab 3 days ago [-]
This exact philosophy is why I started treating UI design systems like compilers.
Instead of validating visual outputs after the fact (like linting CSS or manual design reviews), you parse the constraints upfront. If a layout component is strictly typed to only accept discrete grid multiples, an arbitrary 13px margin becomes a compile error, not a subjective design debate. It forces determinism.
Garlef 3 days ago [-]
curious: what kind of tooling would you use here?
cadamsdotcom 3 days ago [-]
Parsing over validation, and newtypes for everything, fall over when you don’t know the full range of possibilities that can occur in the wild.
It is a handy way to prevent divide by zero as in the article, or to have fun with lambda calculus by asking the type system if 3 + 4 == 8. You can reason about the full range of inputs. Same for file format parsing - making as many failure modes as possible fail as early as possible!
But be VERY wary of using them to represent business logic or state machines that allow only the transitions you believe can exist at this point in time. You just don’t know what wild things people will want to do in business logic, and if your software can’t handle those scenarios, people will just work around it and your stuff no longer matches reality.
3 days ago [-]
the__alchemist 4 days ago [-]
The examples in question propagate complexity throughout related code. I think this is a case I see frequently in Rust of using too many abstractions, and its associated complexities.
I would just (as a default; the situation varies)... validate prior to the division and handle as appropriate.
The analogous situation I encounter frequently is indexing, e.g. checking if the index is out of bounds. Similar idea; check; print or display an error, then fail that computation without crashing the program. Usually an indication of some bug, which can be tracked down. Or, if it's an array frequently indexed, use a (Canonical for Rust's core) `get` method on the whatever struct owns the array. It returns an Option.
I do think either the article's approach, or validating is better than runtime crashes! There are many patterns in programming. Using Types in this way is something I see a lot of in OSS rust, but it is not my cup of tea. Not heinous in this case, but I think not worth it.
This is the key to this article's philosophy, near the bottom:
> I love creating more types. Five million types for everyone please.
cmovq 4 days ago [-]
Dividing a float by zero is usually perfectly valid. It has predictable outputs, and for some algorithms like collision detection this property is used to remove branches.
kibwen 3 days ago [-]
The problem is that "division" in the context of computation refers to multiple different operations, so that we all end up talking past each other. Some people want a `division_where_zero_is_an_illegal_denominator`, some people want `division_but_return_zero_if_denominator_is_zero`, and I'm sure other people want weirder things. Honestly, I'm coming around to the idea that for any precise and reliable system, the fact that we only have a single "operator" (+ - * / etc) that we arbitrarily assign to one of the many possible operations that we want to do does more harm than good. Consider how many different ways there are just to add a number (panic on overflow, error value on overflow, saturate on overflow, wrap on overflow, return tuple with optional carry flag on overflow...).
woodruffw 4 days ago [-]
I think “has predictable outputs” is less valuable than “has expected outputs” for most workloads. Dividing by zero almost always reflects an unintended state, so proceeding with the operation means compounding the error state.
(This isn’t to say it’s always wrong, but that having it be an error state by default seems very reasonable to me.)
4 days ago [-]
ubixar 3 days ago [-]
C# gets close to this with records + pattern matching, F# discriminated unions are even better for this with algebraic data types built right in. A Result<'T,'Error> makes invalid states unrepresentable without any ceremony. C# records/matching works for now, but native DUs will make it even nicer.
barnacs 3 days ago [-]
Every time you introduce a type for a "value invariant" you lose compatibility and force others to make cumbersome type conversions.
To me, invalid values are best expressed with optional error returns along with the value that are part of the function signature. Types are best used to only encode information about the hierarchy of structures composed of primitive types. They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.
mrkeen 3 days ago [-]
> They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.
What would you say to someone who thinks that nested maps of arbitrary strings have maximum compatibility, and using types forces others to make cumbersome type conversions?
barnacs 3 days ago [-]
If the fields of a structure or the string keys of an untyped map don't match then you don't have compatibility either way. The same is not true for restricting the set of valid values.
edit: To put it differently: To possibly be compatible with the nested "Circle" map, you need to know it is supposed to have a "Radius" key that is supposed to be a float. Type definitions just make this explicit. But just because your "Radius" can't be 0, you shouldn't make it incompatible with everything else operating on floats in general.
4 days ago [-]
sam0x17 4 days ago [-]
btw the “quoth” crate makes it really really easy to implement scannerless parsing in rust for arbitrary syntax, use it on many of my projects
IshKebab 4 days ago [-]
Interesting looking crate. You don't seem to have any examples at all though so I wouldn't say it makes it easy!
sam0x17 19 hours ago [-]
Yeah thanks for highlighting this, I should add some examples
MarcLore 3 days ago [-]
[dead]
bbkane 3 days ago [-]
I try to do this in Go as well, but the amount of pointers used to represent optional fields (more generally the lack of algebraic types) is making it less effective/ more verbose than I'd like
qsera 3 days ago [-]
[flagged]
throw310822 3 days ago [-]
I only half understand this stuff, but all this encapsulation of values so that they are guaranteed to remain valid across manipulations... isn't it called Object Oriented Programming?
mrkeen 3 days ago [-]
Was the original blog post wrong?
qsera 3 days ago [-]
were you validating during parsing before?
instig007 3 days ago [-]
Validating during parsing is still parsing, there's a reason why `Alternative f` exists after all: you have to choose between branches of possibilities and falsehoods. Now consider that there's another kind of validation that happens outside of program boundaries (where broader-than-needed data is being constrained in a callee rather than the calling site) that should've been expresed as `Alternative f` during parsing instead. That's the main point of the article, but you seem to only focus on the literal occurence of the word "validation" here and there.
qsera 2 days ago [-]
So you are saying that if at a certain point in parsing the only expected terms are 'a', 'b' and 'c', one should not put the corresponding parsed entry in a `char` (after checking it is either of these aka validating), and instead it should be put in some kind of enum type (parsed via `Alternative f`). Right?
instig007 2 days ago [-]
You put them however you like, be it in a char or a vector of, but the bottom line is that your parsed items are part of the "sanitized" label that allows you to either tuple-unpack or pattern-match (as long as it's near or literally zero-cost) without performing the same validation ever again for the lifetime of the parsed object. The callees that exclusively expect 'a', 'b' and 'c', and for which they perform internal validation step, should be replaced with versions that permit the inputs with sanitized labels only. How you implement the labels depends on the language at hand, in case of Haskell they can be newtypes or labelled GADTs, but the crucial part is: the "validation" word is systematically pushed to the program boundaries, where it's made part of the parsing interface with `Alternative f` and sanitization labels acting on raw data. In other words you collapse validation into a process of parsing where the result value is already being assembled from a sequence of decisions to branch either with one of the possible successful options or with an error.
qsera 1 days ago [-]
> but the crucial part is: the "validation" word is systematically pushed to the program boundaries
Yea, so again. Isn't that freaking obvious?! That author seem to be experienced in Haskell where this kind of thing is common knowledge and for some reason this seems to be some kind of revelation to them...
instig007 23 hours ago [-]
> Yea, so again. Isn't that freaking obvious?!
apparently not, as I always find snippets of patterns of this kind from my coworkers (and I've worked in many companies, including the ones that require precision for legal compliance):
def do_business_stuff(data):
orders = data.get("orders")
if not orders:
return
for order in orders:
attr = order.get("attr")
if attr and len(attr) < 5:
continue
...
The industry's awareness baseline is very low, and it's across tech stacks, Haskell is no exception. I've seen stuff people do with Haskell at 9 to 5 when the only thing devs cared about was to carry on (and preferably migrate to Go), and I wasn't impressed at all (compared to pure gems that can be found on Hackage). So in that sense having the article that says "actually parse once, don't validate everywhere" is very useful, as you can keep sending the link over and over again until people either get tired of you or learn the pattern.
qsera 22 hours ago [-]
But in all seriousness devs could be both be aware and indifferent to it at the same time.
And sometimes, if you are not sure about the constraints 100%, it might even be the safe thing to do.
Alexis King, the author of the original "Parse, Don't Validate" article, also published a follow-up, "Names are not type safety" [0] clarifying that the "newtype" pattern (such as hiding a nonzero integer in a wrapper type) provide weaker guarantees than correctness by construction. Her original "Parse, Don't Validate" article also includes the following caveat:
> Use abstract datatypes to make validators “look like” parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to “fake” a parser from a validator.
So, an abstract data type that protects its inner data is really a "validator" that tries to resemble a "parser" in cases where the type system itself cannot encode the invariant.
The article's second example, the non-empty vec, is a better example, because it encodes within the type system the invariant that one element must exist. The crux of Alexis King's article is that programs should be structured so that functions return data types designed to be correct by construction, akin to a parser transforming less-structured data into more-structured data.
[0] https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
For full-on parse-don't-validate, you essentially need a dependent type system. As a more light-weight partial solution, Rust has been prototyping pattern types, which are types constrained by patterns. For instance a range-restricted integer type could be simply spelled `i8 is 0..100`, or a nonempty slice as `[T] is [_, ..]`. Such a feature would certainly make correctness-by-construction easier in many cases.
The non-empty list implemented as a (T, Vec<T>) is, btw, a nice example of the clash between practicality and theoretical purity. It can't offer you a slice (consecutive view) of its elements without storing the first element twice (which requires immutability and that T: Clone, unlike normal Vec<T>), which makes it fairly useless as a vector. It's okay if you consider it just an abstract list with a more restricted interface.
Scala 3 is much more mainstream and has path dependent types. I've only used Scala 2, and there the boilerplate for dependent types was frustrating imo, but I've heard its better in 3.
Scala 3 indeed is more mainstream but it seems also on the way out. At least here in corporate world it is replaced by Kotlin and Java 21+ for a large part.
[0] https://geeklaunch.io/blog/make-invalid-states-unrepresentab...
[1] https://www.youtube.com/watch?v=2JB1_e5wZmU
EDIT: Parent comment was edited to amend the "impossible/unrepresentable" wording
The invariant is less obvious, but you could still do this if you really wanted. The observation is that addition of two fixed-size nonnegative integers cannot overflow unless at least one is greater than half the range. So your `non_overflowing_addition` function would need take two inputs of type `NonNegativeLessThanHalfMaxValue`, where the constructor for the latter type enforces the invariant. Multiplication is similar, but with the square root of the range (and I suppose `NonNegativeLessThanSqrtMaxValue` could be a subtype of `NonNegativeLessThanHalfMaxValue` if you want to be fancy).
Your intention of having the restricted domain "NonNegativeLessThanHalfMaxValue" is probably so that both addends have the same domain. If you go down that route, perhaps you'll also want the closure property, meaning that the range should also belong to the same set. However, then you have to deal with the overflow problem all over again...
The real point is that when adding two N-bit integers, the range must be N+1 bits, because the "carry bit" is also part of the output. I think this is a scenario where "Parse, Don't Validate" can't easily help, because the validity of the addition is intrinsically a function of both inputs together.
also:
Parse, Don’t Validate – Some C Safety Tips - https://news.ycombinator.com/item?id=44507405 - July 2025 (73 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=41031585 - July 2024 (102 comments)
Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)
Parsix: Parse Don't Validate - https://news.ycombinator.com/item?id=27166162 - May 2021 (107 comments)
Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)
Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)
(p.s. these links are just to satisfy extra-curious readers - no criticism is intended! I add this because people sometimes assume otherwise)
"In Idris, a length-indexed vector is Vect n a (length n is in the type), and a valid index into length n is Fin n ('a natural number strictly less than n')."
Similar tricks work with division that might result in inf/-inf, to prevent them from typechecking, and more subtle implications in e.g. higher order types and functions
Which refers to https://docs.rs/anodized/latest/anodized/
Dependent typing requires:
- Generic types that can take runtime values as parameters, e.g. [u8; user_input]
- Functions where the type of one parameter depends on the runtime value of another parameter, e.g. fn f(len: usize, data: [u8; len])
- Structs/tuples where the type of one field depends on the runtime value of another field, e.g. struct Vec { len: usize, data: [u8; self.len] }
Not that I blame them, nobody has figured out practical dependent typing yet. (Idris is making good progress in the field though)
Type-Driven Development with Idris[1] is a great introduction for dependently typed languages and covers methods such as these if you're interested (and Edwin Brady is a great teacher).
[1] https://www.manning.com/books/type-driven-development-with-i...
Maybe Int
So your program splits into two branches:
1. Nothing branch: you failed to obtain an Int.
There is no integer to use as an index, so you can’t even attempt a safe lookup into something like Vect n a.
2. Just i branch: you do have an Int called i.
But an Int is not automatically a valid index for Vect n a, because vectors are indexed by Fin n (a proof carrying “bounded natural”).
So inside the Just i branch, you refine further:
3. Try to turn the runtime integer i into a value of type Fin n.
There are two typical shapes of this step:
* Checked conversion returning Maybe (Fin n)
If the integer is in range, you get Just (fin : Fin n). Otherwise Nothing.
Checked conversion returning evidence (proof) that it’s in range
For example: produce k : Nat plus a proof like k < n (or LTE (S k) n), and then you can construct Fin n from that evidence.
(But it’s the same basically, you end up with a “Maybe LTE…”
Now if you also have a vector: xs : Vect n a
… the n in Fin n and the n in Vect n a are the same n (that’s what “unifies” means here: the types line up), so you can do: index fin xs : a
And crucially:
there is no branch in which you can call index without having constructed the Fin n first, so out-of-bounds access is unrepresentable (it’s not “checked later”, it’s “cannot be expressed”).
And within _that_ branch of the program, you have a proof of Fin n.
Said differently: you don’t get “compile-time knowledge of i”; you get a compile-time guarantee that whatever value you ended up with satisfies a predicate.
Concretely: you run a runtime check i < n. _ONCE_
If it fails, you’re in a branch where you do not have Fin n.
If it succeeds, you construct fin : Fin n at runtime (it’s a value, you can’t get around that), but its type encodes the invariant “in bounds”/check was done somewhere in this branch.
If we take their "parse" approach, then the types of the arguments a, b, and c have to somehow encode the constraint `b^2 - 4ac >= 0`. This would be a total mess--I can't think of any clean way to do this in Rust. It makes _much_ more sense to simply return an Option and do the validation within the function.
In general, I think validation is often the best way to solve the problem. The only counterexample, which the author fixates on in the post, is when one particular value is constrained in a clean, statically verifiable way. Most of the time, validation is used to check (possibly complex) interactions between multiple values, and "parsing" isn't at all convenient.
Sure, we can follow the advice of creating types that represent only valid states but then we end up with `fn(a: A, b: B, c: C) transformed into `fn(abc: ValidABC)`
Like how clojure basically uses maps everywhere and the whole standard library allows you to manipulate them in various ways.
The main problem with the many type approach is several same it worse similar types, all incompatible.
The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.
The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.
Start with a more dynamic type, do stuff that doesn't care about the shape, parse into a more precise type, do stuff that relies on the additional invariants, drop back into the more dynamic type again.
https://www.stroustrup.com/Concept-based-GP.pdf
```
impl Add for NonZeroF32 { ... }
impl Add<f32> for NonZeroF32 { ... }
impl Add<NonZeroF32> for f32 { ... }
```
What type would it return though?
Generally yes. `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though! ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... )
> I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
`NonZeroU32::checked_add(self, other: u32)` basically does this, although I'll note it returns an `Option` instead of a `Result` ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... ), leaving you to `.map_err(...)` or otherwise handle the edge case to your heart's content. Niche, but occasionally what you want.
I was confused at first how that could work, but then I realized that of course, with _unsigned_ integers this works fine because you cannot add a negative number...
And there are other gotchas, for instance it seems natural to assume that NonZeroF32 * NonZeroF32 can return a NonZeroF32, but 1e-25 * 1e-25 = 0 because of underflow.
I think the article would have been better with NonZeroPositiveF32 as the example type, since then addition would be safe.
Instead of validating visual outputs after the fact (like linting CSS or manual design reviews), you parse the constraints upfront. If a layout component is strictly typed to only accept discrete grid multiples, an arbitrary 13px margin becomes a compile error, not a subjective design debate. It forces determinism.
It is a handy way to prevent divide by zero as in the article, or to have fun with lambda calculus by asking the type system if 3 + 4 == 8. You can reason about the full range of inputs. Same for file format parsing - making as many failure modes as possible fail as early as possible!
But be VERY wary of using them to represent business logic or state machines that allow only the transitions you believe can exist at this point in time. You just don’t know what wild things people will want to do in business logic, and if your software can’t handle those scenarios, people will just work around it and your stuff no longer matches reality.
I would just (as a default; the situation varies)... validate prior to the division and handle as appropriate.
The analogous situation I encounter frequently is indexing, e.g. checking if the index is out of bounds. Similar idea; check; print or display an error, then fail that computation without crashing the program. Usually an indication of some bug, which can be tracked down. Or, if it's an array frequently indexed, use a (Canonical for Rust's core) `get` method on the whatever struct owns the array. It returns an Option.
I do think either the article's approach, or validating is better than runtime crashes! There are many patterns in programming. Using Types in this way is something I see a lot of in OSS rust, but it is not my cup of tea. Not heinous in this case, but I think not worth it.
This is the key to this article's philosophy, near the bottom:
> I love creating more types. Five million types for everyone please.
(This isn’t to say it’s always wrong, but that having it be an error state by default seems very reasonable to me.)
To me, invalid values are best expressed with optional error returns along with the value that are part of the function signature. Types are best used to only encode information about the hierarchy of structures composed of primitive types. They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.
What would you say to someone who thinks that nested maps of arbitrary strings have maximum compatibility, and using types forces others to make cumbersome type conversions?
edit: To put it differently: To possibly be compatible with the nested "Circle" map, you need to know it is supposed to have a "Radius" key that is supposed to be a float. Type definitions just make this explicit. But just because your "Radius" can't be 0, you shouldn't make it incompatible with everything else operating on floats in general.
Yea, so again. Isn't that freaking obvious?! That author seem to be experienced in Haskell where this kind of thing is common knowledge and for some reason this seems to be some kind of revelation to them...
apparently not, as I always find snippets of patterns of this kind from my coworkers (and I've worked in many companies, including the ones that require precision for legal compliance):
The industry's awareness baseline is very low, and it's across tech stacks, Haskell is no exception. I've seen stuff people do with Haskell at 9 to 5 when the only thing devs cared about was to carry on (and preferably migrate to Go), and I wasn't impressed at all (compared to pure gems that can be found on Hackage). So in that sense having the article that says "actually parse once, don't validate everywhere" is very useful, as you can keep sending the link over and over again until people either get tired of you or learn the pattern.And sometimes, if you are not sure about the constraints 100%, it might even be the safe thing to do.