Really is. Seems like they lost their nerve. Their credibility is in the toilet now. Clearly not good faith actors.
Which is a shame because seems like there's a real opportunity to really shake things up with an open voice model that's competitive with the proprietary ones.
Oh well. Someone else will do what they claimed to want to do.
is there any reason why it inserts multiple-seconds-long awkward pauses into the output? are you seeing that behavior in your example? (on a 2021 M1 Max MBP)
The Sesame demo is really impressive but the fact that they record and review your conversations make it a complete non-starter for me to actually use it. I can't feel comfortable knowing some person could actually listen to everything we say. Open sourcing it is great so I could self-host it, although it seems like you can't quite get to something similar to the demo from this so I'm not sure what the point is.
Any provider already hosting this (similar to how many providers host Whisper for STT)? Looks like doesn't support streaming tho (same with Whisper coincidentally), but great to see open models get so much better.
This just goes to show that, especially with Voice AI, people should be thinking in terms of "systems" not "agents". Sesame claims this slow barebones base model is what their demo is built around. Regardless of if that's true or not, it is true that there's a whole lot more that goes into a slick demo like Sesame's Maya and Miles than just hooking up a few models together.
Neat project! How does it handle different accents or speech speeds—does it need a lot of training data for that? Excited to see more open-source stuff in this space.
Turns out it was a rug-pull
They open-sourced a crippled version of sesame (1B)
not the one they're using in actual demo
A16z won’t allow it. For them there is Money to be made.
Very disappointing.
Really is. Seems like they lost their nerve. Their credibility is in the toilet now. Clearly not good faith actors.
Which is a shame because seems like there's a real opportunity to really shake things up with an open voice model that's competitive with the proprietary ones.
Oh well. Someone else will do what they claimed to want to do.
If you want to try this on Mac this Python library worked for me: https://github.com/senstella/csm-mlx
You can run it with uv like this:
this is great, thanks!!
is there any reason why it inserts multiple-seconds-long awkward pauses into the output? are you seeing that behavior in your example? (on a 2021 M1 Max MBP)
Oh interesting, no I haven't spotted that. I'm on an M2, but I haven't spent a great deal of time poking at it with longer inputs.
The Sesame demo is really impressive but the fact that they record and review your conversations make it a complete non-starter for me to actually use it. I can't feel comfortable knowing some person could actually listen to everything we say. Open sourcing it is great so I could self-host it, although it seems like you can't quite get to something similar to the demo from this so I'm not sure what the point is.
Any provider already hosting this (similar to how many providers host Whisper for STT)? Looks like doesn't support streaming tho (same with Whisper coincidentally), but great to see open models get so much better.
It is useless actually. Very slow and quality is suboptimal and it is just speech generation component. See discussion here:
https://github.com/SesameAILabs/csm/issues/80
This just goes to show that, especially with Voice AI, people should be thinking in terms of "systems" not "agents". Sesame claims this slow barebones base model is what their demo is built around. Regardless of if that's true or not, it is true that there's a whole lot more that goes into a slick demo like Sesame's Maya and Miles than just hooking up a few models together.
Neat project! How does it handle different accents or speech speeds—does it need a lot of training data for that? Excited to see more open-source stuff in this space.
OpenSeasame would be a great project name!