I do think Rust would be better for web dev if it had GC, but it doesn't, and no other language comes close to having as good ergonomics otherwise. And the memory management is something you learn once and then it's a bit verbose but no big deal. If you feel like you absolutely have to write custom data structures with circular references for your web server then I tentatively suggest that maybe you're doing web dev wrong.
In my team we onboarded a data scientist used to working in Python who had never used Rust onto a Rust project, and it was just not a big deal. Maybe I'm just fortunate when it comes to colleagues.
Sounds like nonsense. Again, I asked in another comment for an example of some Rust “web” code that exemplifies what you are talking about. You mentioned “ergonomics”, and you mentioned “noob friendly”. I’d love to see some of this Rust code.
Show us, we’ll discuss.
I feel like some devs are so insecure that they really think , ugh, I can’t even fully explain the pathology of the Rust people without cursing them out.
You are not a better developer, that’s ALL I want to say to the Rust people. In fact, most of you are bad developers for doing what you have been doing with this language. You ALL must find a better way to show your intellectual prowess.
I heard you guys are even bugging the Linux people.
It is, in my opinion (as an avid Rust user!). The type errors from most of the major web frameworks/ORMs (diesel, sqlx) are just awful, more often than not. Usually some inscrutable thing involving Send/Sync. Or some hilariously complicated type/trait hackery on the part of the library, attempting to save me from the former, that I'm never going to figure out.
Great language in many other settings, but not this one. At least not right now, but given my experience with async Rust in general, I'm not sure it ever will be.
> Usually some inscrutable thing involving Send/Sync.
What kind of queries are you writing? I never see SQLx emit anything like that. I always get back SQL errors.
Column mismatches are what I hit most in development, and they're pretty explicit.
The hairiest thing I see with SQLx is when I try to write custom type conversions for my own types to SQL fields. I sometimes have to delve into macros. But those errors are pretty self-explanatory too.
Rust is emerging as one of the best web programming languages out there.
Actix and Axum feel like Python's Flask.
Rust has decent Redis and connection pool libraries, but the SQL space needs more work. Diesel SQL is too ORM-y (I've never liked ORMs). While SQLx allows you to write "typechecked" SQL, it still has really annoying edge cases (WHERE IN clauses can't be typechecked, type bindings can get hairy, etc.)
I'm not very happy about the state of Rust's Elasticsearch libraries, either.
Rust probably needs a Rails/Django-like framework too for those that prefer a framework-oriented development lifecycle.
Rust also needs some observability frameworks. There are a few, but the choices are sparse.
I'd give Rust a 7.5/10 for web programming, and as far as the promise of the language goes, I'd give it an 11/10. Developing in Actix and Axum feels amazing. It's honestly better than Go and Python. The other pieces (database, API clients, etc.) will presumably get better in time.
And because of the way HTTP request flow logic is typically structured, 99.9% of the time you'll never hit Rust's borrow checker or have to worry about lifetimes. It's as if you've been given one of the best typed languages, best package managers, and nearly no tradeoffs. The server compiles down to a single static binary. It's multithreaded, and it's blindingly fast.
I'm picking Rust for every new web service I write these days.
This is obviously heavily biased, because there is no way any reasonable person would think Axum or Actix are like Flask. That's just not possible with a language like Rust. The Rust standard library is horrible compared to Python or Go.
You need more dependencies to build a simple APi in Rust than you need in Python, and Go combined.
Axum, tokio, serde, serde_json, anyhow, sqlx and probably 5 more to fix the bad standard library.
In Python and Go you can build web app with the standard library.
TBH after adding in a database, Rust is probably not that much faster than Go and Go has everything you need in the standard library, compiles to binary, and package manager doesn't matter cuz you don't need one using Go.
Show me some Rust web code that you think exemplifies what you are talking about. I think your example will speak for itself and close this argument. No reason to go back and forth.
This. People are sleeping on one of the biggest developments to hit backend. They'll know soon enough.
I'm too tired to respond to the two detractors, but it's hilarious that one of the arguments against Rust is pulling in packages. Some of the best packages, at that. I wonder if that's their argument against most other languages.
Big standard libraries are a mistake, because the language is forever left with shitty old design decisions. Python's standard library is full of crap.
I might just put together a side by side Flask / Go / Rust. It'll be so damning against Python and Go. Rust is the same LOC count, complexity, and yet is a nicer language, better type system, and is as fast as nginx.
Nvidia not name products after existing things in the ML space challenge: IMPOSSIBLE
More seriously, though:
> OpenAI Compatible Frontend – High performance OpenAI compatible http api server written in Rust.
Is this normal in this space? I know everyone has settled on copying the S3 API for block storage but I’m unsure if we’ve done the same for LLM serving.
This is very narrowly focused on LLMs, whereas triton is still useful for running all kinds of ML models. In practice, Triton is a very poor choice for LLMs specifically because it has none of the required non negotiable features like KV caching built in.
As someone who spent the better part of a year trying to get various Nvidia inference products to work _at all_ even with a direct line to their developers, I will simply say "beware".
Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.
Triton is not that bad at all, considering the wide scope of systems it has to support (tensorrt, onnx, multiple generations of pytorch, cuda, python). It was much nicer than the old Torchserve project which was JVM based.
I've done very little with Nvidia software, but what I have done puts me off ever doing it again. I quit a job partially because it involved trying to get their shit to work. (There were other factors, but that was definitely on the 'GTFO' side)
As someone who has run LLMs in production, using Ray is probably the worst idea. It's not optimized for language models, and is extremely slow. There's no KV-caching, model parallelism, and other basic table stakes features that are offered by Dynamo or other open source inference frameworks. Useful only if you have <1 QPS.
Use SGLang, vLLM, or text-generation-inference instead.
It really depends on the task. If you have 1 massive job, Ray sucks and doesn't provide table stakes. If you have 50M tiny jobs, Ray and kuberay is great and serves as the backbone of several billion dollar products.
Built in Rust for performance and in Python for extensibility
Omg, a team that knows how to selectively use tech as needed. Looking at the Rust web developers in corner.
I do think Rust would be better for web dev if it had GC, but it doesn't, and no other language comes close to having as good ergonomics otherwise. And the memory management is something you learn once and then it's a bit verbose but no big deal. If you feel like you absolutely have to write custom data structures with circular references for your web server then I tentatively suggest that maybe you're doing web dev wrong.
In my team we onboarded a data scientist used to working in Python who had never used Rust onto a Rust project, and it was just not a big deal. Maybe I'm just fortunate when it comes to colleagues.
Sounds like nonsense. Again, I asked in another comment for an example of some Rust “web” code that exemplifies what you are talking about. You mentioned “ergonomics”, and you mentioned “noob friendly”. I’d love to see some of this Rust code.
Show us, we’ll discuss.
I feel like some devs are so insecure that they really think , ugh, I can’t even fully explain the pathology of the Rust people without cursing them out.
You are not a better developer, that’s ALL I want to say to the Rust people. In fact, most of you are bad developers for doing what you have been doing with this language. You ALL must find a better way to show your intellectual prowess.
I heard you guys are even bugging the Linux people.
Unsure if the implication is that Rust is poorly suited for web development or what.
It is, in my opinion (as an avid Rust user!). The type errors from most of the major web frameworks/ORMs (diesel, sqlx) are just awful, more often than not. Usually some inscrutable thing involving Send/Sync. Or some hilariously complicated type/trait hackery on the part of the library, attempting to save me from the former, that I'm never going to figure out.
Great language in many other settings, but not this one. At least not right now, but given my experience with async Rust in general, I'm not sure it ever will be.
> Usually some inscrutable thing involving Send/Sync.
What kind of queries are you writing? I never see SQLx emit anything like that. I always get back SQL errors.
Column mismatches are what I hit most in development, and they're pretty explicit.
The hairiest thing I see with SQLx is when I try to write custom type conversions for my own types to SQL fields. I sometimes have to delve into macros. But those errors are pretty self-explanatory too.
Rust is emerging as one of the best web programming languages out there.
Actix and Axum feel like Python's Flask.
Rust has decent Redis and connection pool libraries, but the SQL space needs more work. Diesel SQL is too ORM-y (I've never liked ORMs). While SQLx allows you to write "typechecked" SQL, it still has really annoying edge cases (WHERE IN clauses can't be typechecked, type bindings can get hairy, etc.)
I'm not very happy about the state of Rust's Elasticsearch libraries, either.
Rust probably needs a Rails/Django-like framework too for those that prefer a framework-oriented development lifecycle.
Rust also needs some observability frameworks. There are a few, but the choices are sparse.
I'd give Rust a 7.5/10 for web programming, and as far as the promise of the language goes, I'd give it an 11/10. Developing in Actix and Axum feels amazing. It's honestly better than Go and Python. The other pieces (database, API clients, etc.) will presumably get better in time.
And because of the way HTTP request flow logic is typically structured, 99.9% of the time you'll never hit Rust's borrow checker or have to worry about lifetimes. It's as if you've been given one of the best typed languages, best package managers, and nearly no tradeoffs. The server compiles down to a single static binary. It's multithreaded, and it's blindingly fast.
I'm picking Rust for every new web service I write these days.
This is obviously heavily biased, because there is no way any reasonable person would think Axum or Actix are like Flask. That's just not possible with a language like Rust. The Rust standard library is horrible compared to Python or Go.
You need more dependencies to build a simple APi in Rust than you need in Python, and Go combined.
Axum, tokio, serde, serde_json, anyhow, sqlx and probably 5 more to fix the bad standard library.
In Python and Go you can build web app with the standard library.
TBH after adding in a database, Rust is probably not that much faster than Go and Go has everything you need in the standard library, compiles to binary, and package manager doesn't matter cuz you don't need one using Go.
The standard lib is not "bad" it just is modular to avoid C++ like pitfalls.
People doing web dev in python could use pythons native json but rarely do because there are other far more performant options for example.
Nowehere close to the pleothora of tooling and frameworks available in Java and .NET ecosystem for all kinds of distributed computing scenarios.
And if one misses an advanced ML type system, Scala, Kotlin, F# are there.
Show me some Rust web code that you think exemplifies what you are talking about. I think your example will speak for itself and close this argument. No reason to go back and forth.
same here, Rust with actix can even replace nginx.
This. People are sleeping on one of the biggest developments to hit backend. They'll know soon enough.
I'm too tired to respond to the two detractors, but it's hilarious that one of the arguments against Rust is pulling in packages. Some of the best packages, at that. I wonder if that's their argument against most other languages.
Big standard libraries are a mistake, because the language is forever left with shitty old design decisions. Python's standard library is full of crap.
I might just put together a side by side Flask / Go / Rust. It'll be so damning against Python and Go. Rust is the same LOC count, complexity, and yet is a nicer language, better type system, and is as fast as nginx.
People don't know how good Rust webdev is.
The complaint about serde in particular... Everyone and their dog includes Jackson for webdev in Java.
Jackson is awful. Serde on the other hand is the smoothest JSON library I've ever used.
they actually implemented a decent amount of the HTTP stuff in rust. if you look at the docs
Nvidia not name products after existing things in the ML space challenge: IMPOSSIBLE
More seriously, though:
> OpenAI Compatible Frontend – High performance OpenAI compatible http api server written in Rust.
Is this normal in this space? I know everyone has settled on copying the S3 API for block storage but I’m unsure if we’ve done the same for LLM serving.
Increasingly so. Many other popular inference tools in this space also expose an OpenAI compatible API: VLLM, Llama.cpp, and LiteLLM all do.
So this replaces triton for LLMs or?
This is very narrowly focused on LLMs, whereas triton is still useful for running all kinds of ML models. In practice, Triton is a very poor choice for LLMs specifically because it has none of the required non negotiable features like KV caching built in.
same question here. Just asked Grok for a comparsion https://grok.com/share/bGVnYWN5_fa210574-f27b-45ae-9d95-19ed...
As someone who spent the better part of a year trying to get various Nvidia inference products to work _at all_ even with a direct line to their developers, I will simply say "beware".
Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.
Triton is not that bad at all, considering the wide scope of systems it has to support (tensorrt, onnx, multiple generations of pytorch, cuda, python). It was much nicer than the old Torchserve project which was JVM based.
I've done very little with Nvidia software, but what I have done puts me off ever doing it again. I quit a job partially because it involved trying to get their shit to work. (There were other factors, but that was definitely on the 'GTFO' side)
Can you share some of your wisdom on setting up a scalable inference infrastructure?
Use Ray Serve. https://docs.ray.io/en/latest/serve/index.html
As someone who has run LLMs in production, using Ray is probably the worst idea. It's not optimized for language models, and is extremely slow. There's no KV-caching, model parallelism, and other basic table stakes features that are offered by Dynamo or other open source inference frameworks. Useful only if you have <1 QPS.
Use SGLang, vLLM, or text-generation-inference instead.
It really depends on the task. If you have 1 massive job, Ray sucks and doesn't provide table stakes. If you have 50M tiny jobs, Ray and kuberay is great and serves as the backbone of several billion dollar products.
Good for the goose, good for the gander...
This is probably true, but unlike every Nvidia product we tried, it did, you know, reply to inference requests with actual output. That said, you can serve vLLM with Ray Serve. https://docs.ray.io/en/latest/serve/tutorials/vllm-example.h...
Ray doesn't offer anything if you use vLLM on top of Ray Serve though.
It does if you need pipeline parallelism across multiple nodes.
is this in reference to Triton?
And NIM, yes.