I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?
> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.
The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI labs piracy-maxxing.
You go where the money is. Infra isn’t free. Churches pass the plate every Sunday. Perhaps one day we’ll exist in a more optimal socioeconomic system; until then, you do what you have to do to accomplish your goals (in this context, archivists and digital preservation).
There is a certain irony in people providing copyrighted works for free justifying profiting from these copyrights on the basis that providing the works to others isn’t free.
I'd have a lot more sympathy if the music industry didn't try all of the worst available options to handle piracy for years and years.
They had many opportunities to get out ahead of it, and they squandered it trying to cling to album sales where 11/13 tracks were trash. They are in a bed of their own making.
Your link doesn’t work. But I assume you are talking about this label? I looked at the first artist and I found the artist’s music on iTunes. Everything that Apple sells on the iTunes Music Store has been DRM free AAC or ALAC (Apple lossless) since 2009.
Cost recovery isn’t profit. Copyright is just a shared delusion, like most laws. They’re just bits on a disk we’re told are special for ~100 years (or whatever the copyright lockup length is in your jurisdiction), after which they’re no longer special (having entered the public domain).
I think what is more ironic is we somehow were comfortable being collectively conditioned (manufactured consent?) with the idea that you could lock up culture for 100 years or more just to enable maximum economic extraction from the concept of “intellectual property” and that to evade such insanity is wrong in some way. “You can just do things” after all.
> Your savings account is just bits on a disk, yet presumably it represents value that you worked for and which belongs to you to do with what you wish.
That's another example of the shared delusion, since yes, we tell eachother it represents labor and resources, and the market engages in allocation somewhat efficiently, and so the money is a pretty accurate representation of the value of labor and the value of resources.
In reality, that's not true, because the most highly compensated jobs are some of the least valuable, such as investment bankers, landlords, or being born rich (which isn't even a job, but is compensated anyway). Rent seeking is one of the most highly compensated things you can do under this system, but also one of the most parasitic and least valuable things.
Your savings account's number is totally detached from accurately representing value. It's mostly a representation of where you were born.
Everyone is doing it, who Cates anymore. Genie's out of the bottle, we could've tried to solve this for decades and yet we didn't so now we reap what we sowed. Happens, move on.
That made me chuckle, Enterprise Level Access. I mean as ai company, that’s incredibly cheap and instead of torrenting something, why get it. That price is just a fraction of a engineers salary.
> I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
Spotify is $12/month at most to get unlimited ad-free access to virtually all music.
To get access to "all" TV content legally would be hundreds of dollars a month. And for many movies you must buy/rent each individually. And legal TV and movies are much more encumbered by DRM and lock in, limiting the way you can view them. (like many streaming apps removing AirPlay support, or limiting you to 720p in some browsers)
I think Spotify wins over pirating because of its relatively low cost and convenience. Pirating TV/Movies have increased as the cost to access them has.
> The Anna’s archive group is ideologically motivated.
Very interesting, thank you. So using this for AI will just be a side effect.
And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.
I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.
Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.
> The Anna’s archive group is ideologically motivated.
Anna’s archive business is stealing copyrighted content and selling access to it. It's not ideologically motivated.
What ideology is about pirating books and music where most of the people producing this stuff cannot afford to do it full-time? It's not like pirating movies, software and large videogame studios, which is still piracy, but they also make big money and they don't act all the time in the interests of the users.
Writers and musicians are mostly broken. If we sum the rising cost of living, AI generated content and piracy, there's almost no reward left for their work. Anna’s archive is contributing to the art and culture decadence. They sell you premium bandwidth for downloading and training your AIs on copyrighted content, so soon we can all generate more and more slop.
Agreed. I see far too many people rationalizing piracy as a principled thing to do. Instead of finding ways to improve the market such that the control of content isn't siloed in monopolistic corporations, many celebrate Annas Archive which is itself a more or less monopolistic profit-interested entity. The major difference being that we don't have to pay directly. The cost continues to fall on the writers and artists and the industry suffers.
> Instead of finding ways to improve the market such that the control of content isn't siloed in monopolistic corporations
I always assumed the "Anna" in the name was for "Anarchist." My assumption about the archive is that they don't believe there's an ethical solution to the restriction of access to data that involves a capitalist market.
> I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand.
It may be relevant for those people, but I lost all interest in current TV or
streaming stuff. I just watch youtube regularly. What's on is on; what is not on is not really important to me. My biggest problem is lack of time anyway, so I try to reduce the time investment if possible, which is one huge reason why I have zero subscriptions. I just could not keep up with them.
I can't think of many situations where that would be particularly valuable, considering it favours recent plays and the cutoff date is already almost half a year old.
Why don't you ask them where the money inteded for artists is going? You know? The small insignificant companies of Sony, Warner Music, EMI that own the vast majority of music and own all the contracts?
I just started DJing and something I quickly noticed is how garbage Spotify's music sounds compared to FLACs I've purchased. The max bitrate is very low.
most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
most musicians just want to make music, express themselves, and connect with folks who enjoy their stuff or want to make music with em.
Even some of the largest artists in the world only receive a few grand a year from streaming. Only the top 1% or so of artists get enough streams to even come close to living off it. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.
youtube is kind of the same way. the real money comes from sponsorships which come from engagement. nobody on youtube is upset that their video got stolen because that mentality was never sold to us to justify screwing us over. musicians, however, were used as pawns so music labels could get more money.
now folks will say stuff like "this is theft" which is just a roundabout way of supporting labels who steal from the artists. so, it's just a weird gaslighting. there's a reason folks turned on metallica over the napster stuff. metallica were being used to further the interests of labels over the interests of fans. and now you're doing the same thing :) It's a script we hear over and over again yet people keep falling for it.
> most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
I think you have it the wrong way round. I'm sure that musicians would love to make money from album / song sales. It's just that between piracy and companies like Spotify, artists make pennies on these activities, so their only choice is to make money on more labor-intensive stuff where they retain more control.
Note that Spotify, somehow, finds it profitable to be in the streaming business.
I think it was was Les Claypool (of the band Primus) who said on some podcast that recording a studio album with its attendant very non-trivial costs is really just creating a very expensive business card to hand out to prospective clients.
Back then, that is. It probably cost $250k in 1990 for them to record Frizzle Fry in a studio, handwave $500k in 2025 dollars. But Bandcamp on MacBook and some gear from GuitarStudio, round to $15k and your time. neither of which isn't trivial or cheap, but it's not 1990 no more.
> I'm sure that musicians would love to make money from album / song sales.
i think we're actually in agreement. I just don't see streaming as a "must". A lot of musicians I work with and follow also don't see streaming as a must. It's a necessary evil in today's convenience fixated life/culture.
Most musicians I ask about this absolutely fucking hate streaming and don't view it as a real revenue stream.
That's why nearly all merch tables still have CDs, bandcamp links or records for purchase. Artists make more money off a t-shirt sale than they do from 50,000 streams.
I think you slightly misinterpreted what I meant by "selling their music". Or I might have said it poorly.
also, piracy does not mean less money for small artists. evidence suggests the opposite, i think. I think piracy marginally harms record sales for the top 1% of artists while benefiting basically all other artists.
piracy = free exposure. more exposure means more ticket sales, more merch sales, etc. most musicians i know just want people to hear their stuff. piracy enables that for the majority of folks who can't afford to buy every album. i think artists care more about their art being used in commercial stuff without permission/payment, not everyday people checking their shit out.
Unless you're a small potato. Approximately 0% of what I pay for spotify goes to the artists I actually listen to. Fucking Taylor Swift and the Beatles estate don't need my money.
There is a reason people like T Swift and whatnot tour constantly, it's how they make money. Weird Al is known for his amazing live shows, there's a reason for it: they make more money.
When he says "so if I'm doing the math right that means I earned $12" I interpret that as him exaggerating for effect. It's definitely not him citing the pay slip.
"$2 or more per thousand streams, split across rightsholders" seems like an accurate estimate.
Assume an artist (either directly or through a rights holder) makes 1/3 income from streaming, 1/3 from merch and physical albums, and 1/3 from live events.
40m streams per year would be 800k per week. 200k fans worldwide playing 4 times per week on average could get you there. Thats like a decent sized but not enormous youtube channel.
200k fans worldwide would also support the ticket sales and merchandise sales aspects.
99% of that 10 billion went to a handful of artists. Actually, I'd wager nearly half of it went to labels and other middlemen, but that's beside the point. The vast majority of money in the music industry never trickles down, ever.
edit: I looked it up, 70% of spotify's payouts go directly to labels, not artists. So...that $10 bil is nothing.
This is by design and it's the same broken system that metallica defended in the 90s/00s because it benefits large artists while fucking over the other 99%.
We keep repeating the same script using the same busted short term logic.
Not true at all. I support small artists and it's the only way they make money. Ticket sales and merch make up the vast majority of artist revenue for artists who arent in the top 1%. Most musicians don't make money if they aren't touring or selling merch somehow.
there's also the invaluable aspect of networking that touring allows. bit of a tangent, but it's very important for musicians to network.
The exception are musicians who do production stuff. Think movie/tv scores, commercials, etc. I actually know a handful of artists who used to tour quite a lot but eventually settled down to do production stuff. So they transitioned from touring to make money to production. Touring all year with no healthcare catches up to people.
I know a number of musicians that tour nightclubs, small venues, and festivals.
They make a living; not a luxurious one, but they do OK. They just enjoy making music, and feel that it's worth it. Many of them never even record their music.
>The thing is, this doesn't even seem particularly useful for average consumer
it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?
the average consumer will flock when Spotify ultimately enshitifies
But, Netflix did lose their content by choice! Way back in the 00s, you could pay Netflix something like $5 a month, and they would mail you physical DVDs of almost any movies you could ever want to watch. In fact, my recollection is that the physical library was generally much more extensive than the streaming library, at least through the early ‘10s.
Sure, they had the rug yanked out from under them with digital streaming, but they very deliberately put themselves into that position when they pivoted to streaming in the first place.
I wouldn't call this very effective. It would take an impractically long amount of time to capture a meaningful fraction of the collection and quality would suffer greatly.
Even if you plug the audio output into the input you would still be taking a quality loss by passing the audio through a DAC and then an ADC. Maybe if the quality of your hardware is good enough it wouldn't matter, but then you would be limited to only ripping 24 hours of audio per day...
> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?
I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.
It's probably going to make the AI music generation problem worse anyway...
Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?
This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.
Expressing frustration at the pervasive tendency of technologists to look at everything, including art which is a reflection of peoples' subjective realities, with an "at-scale" lens, e.g., "let's collect ALL of it, and categorize it, and develop technologies to mash it all together and vomit out derivative averages with no compelling humanist point of view"
Well, that seems like a pretty reasonable thing to be pissed off about, thanks for taking the time to elaborate.
I think the overlap between the bureaucratic technologies developed by people who, by all accounts, are genuine lovers of the subjectivity and messiness of music qua human artistic production (e.g. the algorithmic music recommendation engines of the '00s and early '10s; public databases like discogs and musicbrainz; perhaps even the expansive libraries and curated collections in piracy networks like what.cd), and the people who mainly seem interested in extracting as much profit as possible from the vast portfolios of artistic output they have access to (e.g. all of Spotify's current business practices, pretty much), should probably prompt some serious introspection among any technologists who see themselves in that first category.
I read an essay a number of years back, which raised the point that, if you're an academic or researcher working on computer vision, no matter how pure your motives or tall your ivory tower, what do you expect that research to be used for, if not surveillance systems run by the most evil people imaginable. And, thus, shouldn't you share some of that moral culpability? I think about that essay a lot these days, especially in relation to topics like this.
How does Spotify defend people who actually make art? There's virtually no difference between pirating and steaming through Spotify for the vast majority of artists.
Yes they do use DRM. I know they are using Widevine on the web player, but possibly other ones too (never looked very far). Not sure for the app, it might be that it is using OGG streams with a custom DRM (which is probably the one some existing downloaders actually (ab)use).
Let me start over. Youtube itself has DRM required for certain videos, and certain formats of videos.
The 256 kbps format for music will be protected by DRM. If you do not have DRM available youtube will fallback to a lower quality format to play the auduo.
Music might have higher quality audio-only files as provided where Youtube might have it combined with video and a generic compression algorithm applied as with all other uploaded videos.
Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.
Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.
DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.
This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.
>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.
They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.
> They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook [...]
I think it's pretty clear from history that they are too big to have to pay out a huge settlement.
First, they never had to. There was never a "huge" settlement, nothing that actually did hurt.
Second, the US don't do any kind of antitrust, and if a government outside the US tries to fine a US TooBigTech, the US will bully that government (or group of governments) until they give up.
Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.
> The thing is, this doesn't even seem particularly useful for average consumers/listeners
Yeah. To me it is not really relevant. I actually was not using spotify
and if I need to have songs I use ytldp for youtube but even that is
becoming increasingly rare. Today's music just doesn't interest me as
much and I have the songs I listen to regularly. I do, however had, also
listen to music on youtube in the background; in fact, that is now my
primary use case for youtube, even surpassing watching movies or anything
else. (I do use youtube for getting some news too though; it is so sad that
Google controls this.)
To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.
I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.
Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.
> There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown.
Music licensing (in the US at least) is actually pretty nice for this (from the licensee perspective anyway). There are mechanical licenses which allow you to use music for many uses without contracting with the rightsholders and clearinghouses whose job is to determine where to send royalties. So you can use the music and send reporting and royalties to the clearing houses and you're done.
Of course, you may want to contract with the rightsholders if you don't like the terms of the mechanical license; maybe it costs too much, etc. If you're Spotify or similar and you have specific contracts for most of the music, and have to pay mechanical license rates for the tail, it might make sense to do so in order to boast of a larger catalog.
It's Redacted.sh, a.k.a. RED. They have around three million torrents. But like What.CD, Redacted.sh is a private tracker, so you can't just jump in and see the content.
Another comment mentioned Redacted.sh as a successor. I haven't used it. I'm sure there's a subreddit around that can help. Looks like orpheus is another option if I'm reading correctly. You have to get an invite or pass an "interview" though, so be prepared to wait a while.
Yeah, What.CD had a bunch of the local Brisbane post-rock bands from the 00s on there which was amazing to me. I at least have copies of a lot of their records!
True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.
This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.
That's why I use YouTube Music as my streamer as they allow damned near anyone to upload any old rare record and then figure out the royalties somehow.
Yeah, it was a great place. I have a paid Spotify account but finally got an ancient hard drive onto my network for all sorts of stuff Spotify doesn’t or can’t have (e.g., Coldcut: 70 Minutes of Madness).
Redacted.sh is a worthy successor, but the average person just doesn’t care about “which release is best” anymore. I use YT Music as a backup but Redacted is my main source of music these days.
You can’t talk about what.cd without talking about its precursor OiNks Pink Palace. Even Trent Reznor was public about what an amazing place it was. Music aside, the community existing just for the shared love of music and not for any other kind of monetary or influencer gain is what set it apart. We just don’t have those kinds of communities for music online anymore
>We just don’t have those kinds of communities for music online anymore
They're still kind of around, but yeah, everything is very much on it's way out in the music scene, at least in terms of that late 90s early 00s culture. Or has been until recently. There is a renewed interest in self-hosting and "offline" style music collections.
It sucks too. The way folks discover music is important. The convenience of streaming has lead to some interesting outcomes. When self-hosting music comes up this is always one of the top questions people have: How do you find new music?
The answer isn't that hard and really hasn't changed much. People just don't want to spend any time or effort doing it. Music stores still exist, they're amazing. Lots of 2nd hand stores carry vinyl and CDs now, which can give you great ideas for new music. There are self-hosted AI solutions and tools. Last.fm and Scrobbling are still very much around. My scrobble history is so insanely useful. There are music discords. Friends. Asking people what they're listening to in public. Live shows with unique openers(I once went to a Ben Kweller show with 4 opening bands, I still listen to 3 of them.)
I love that SoulSeek still exists in some format. My path was Napster (made me get cable Internet and a cd burner) > AudioGalaxy (learned how to path things on routers so I could download music to home from work) > SoulSeek. Plus it had some useful chat and people who cared about sound quality and metadata.
Soulseek has to be the best kept secret on the internet. Even people my age who grew up with things like Napster, Limewire, and even soulseek, don’t know that it still exists.
That being sad, I have a lot of non-mainstream tracks in my playlists on YouTube Music that have YouTube comments along the line of “I wish this was available on Spotify :’(“. I bet the same goes for What.CD.
So there’s some way to go for a comprehensive music archive.
It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:
There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.
I just found out that https://annas-archive.li/ is masked by my German internet provider (SIM.de/Drillisch).
I usually use a VPN but I had it switched off temp. to watch Fallout (Prime Video won't let you watch through a VPN). Only when I switched Mullvad back on could I open the site.
In that vein, I am trying to find out why searching for
alextud popcorntime
which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, Bing
They even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?
I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.
There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.
Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?
This is something really important, especially in the days when music and film vanishes from platforms one by one. I myself have three playlists with greyed out titles (titles are missing so there's no possibility for me to find out what was there).
That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day
Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.
The person who wrote this Spotify p2p software also wrote uTorrent, which was bought by the company bittorrent after they struggled to make a C++ client on their own. The original bittorrent implimentation was in python, but they re-skinned uTorrent as bittorrent and shipped both for a few years.
https://en.wikipedia.org/wiki/Ludvig_Strigeus
I recently got into the whole homelab *arr stack for things like movies and tv and while I know options exist for music I just don’t see the need yet price-wise. Spotify is still just cheap enough for me to not care enough. We’ll see how long this holds.
That being said it’s no secret Spotify and other streaming services barely pay even popular artists. Artists make money from live shows and merch. The fact that their music is behind a paywall at all could mean they make less money from some lack of exposure.
I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.
Hmm. This is actually not really something I need, I think; but
I consider anna's archive etc... as about as important as the
internet web archive. We need to preserve data, at the least
important data, also historic data - how the original websites
looked. Creativity of past generations. Same for games and books.
It may be only ~30 years for webpages to have emerged, but there
are also many young people who may not have experienced that since
they are too young to have experienced it. There is always a
generational change; our generation has the opportunity to store
more things.
> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!
As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.
The other alternative is algorithmically created music.
I've heard 2:00 is some kinda sweet spot for the Spotify algorithm and payouts? You get paid per play so you don't want to it too long, but if your track is much shorter than two minutes you get penalized or something. I know they've had to remove ambient tracks that were cut into 40 second clips as part of this.
So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?
Hmmm I don’t like this. There are sources for music with better quality out there and all this will do is paint them a bigger target for takedowns/prosecution. I am worried about losing their ebook library. Quoting from the announcement: “Generally speaking, music is already fairly well preserved.“ They should have done this as a separate identity.
Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.
This is one of the greatest news I've ever heard for the digital preservation community. Just so many projects over the years could have used resources like this. Thank you for contributing to humankind!
This is incredible. I once assembled a collection of 100,000 tracks for research on exploration of large music libraries. Essentially vector search. I was limited in storage and processing power to a single machine.
If I were to do it today, I could get so much farther with hyperscaler products and this dataset.
I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.
It seems to be that the metadata doesn't include the lyrics, probably because they are provided by Musixmatch. It would have been nice to have a database of lyrics linked to ISRCs. AFAIK Lrclib doesn't support downloading lyrics for a given ISRC.
Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.
Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point
As a belated followup, I should observe that if you're playing "in C sharp minor" on the black keys, you're skipping notes 3, 6, and 7 of the scale... and those are the only notes that differ between a minor scale and a major scale, making the "minor" designation completely meaningless.
Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.
There is a sweet spot for the bass. Lower is better for deep bass, but too low and it stops being a recognizable note, and consumer speakers can't reproduce it. This effect exists though I'm not sure if it is the cause of the pattern here.
C# I don’t believe was/is a common tuning for most western instruments, classical or modern.
A digital piano can transpose things to make it “easier” to play.
Cursory google search says that a sitar is traditionally tuned to something useful for c#
I’m curious if C# is one of those notes that lines up nicely with whatever crappy consumer stereos/subs were capable of reasonable reproducing in the 90s as electronic music was taking off and it stuck around as a tribal knowledge for getting more “oomph” out of your tracks.
Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.
YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.
I had to chuck Youtube Music away when it was polluting my youtube playlists with stuff I was liking on youtube music. Me as a video viewer and me as a music listener are two completely different people.
and you can upload 100,000 of your own tracks to the service for your private use as well. It is a great service considering I am getting it as a side effect of youtube premium. Single handedly the last subscription I would cancel.
I always find these takes curious because they could not be further from my experience. I'm still discovering tons of good music. Perhaps it's specific to genres, but I haven't encountered any generated junk tracks.
Really? How about asking google to "play bloomberg news on spotify" next time. Then see if you can remove the resulting chaos from your history so it won't start feeding you slop.
This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.
Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.
I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.
It would be interesting to find out how that has changed with the growth of the music industry over the years. I suspect that many of these <1000 streamed could be artificially generated for monetary purposes but I'm not entirely sure. That being said, there is a lot of good music with less than 1000 streams. I've been looking myslef and I've definitely found some hidden gems.
I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.
This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.
I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)
Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?
But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.
Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.
a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.
it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.
I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.
I am in no way saying that this is cheap but 300 TB will set you back a little less than $6k with tax. Very attainable for people other than OpenAI and Facebook. And it's not crazy at all to snag a server with enough bays to house all those.
For reference, considering you can purchase a 12-month Spotify Premium subscription via a $99 gift card at the moment, that same $6k could be used for 60 years of Spotify Premium.
For reference, cosidering the backup has 86 million music files, at an average of 3 minutes per file it would take you around 490 years to listen to all the tracks.
Interesting if that is considered to be copyrightable. Any white noise track is perceptually indistinguishable from another, but none have the exact same sequence of samples except by chance, or if the noise generator happens to be deterministic as a function of time.
I find it so odd that people then to streaming services for stuff like this. I have a dedicated white noise machine, and when I travel, I use the white noise (bright noise actually) built into the iPhone.
Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.
You might find it interesting that there's an entire genre of youtube video that's designed to just be chucked one by one into slideshows for elementary school teachers to use as their lesson plan. Including videos that are just "2 minute timer for kids!"
Attracting the ire of the music industry seems like a huge, unnecessary risk. I wish they had performed this as some kind of other entity to try to keep the ebook archive protected from the fallout. I fear this will not end well.
I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.
Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.
Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?
Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?
I mean, I'm sure there's some misclassification, but chamber music is basically a couple people with any sort of music training on classical instruments so that doesn't surprise me nearly as much... I can easily imagine there being _lots_ of those, and you might come up with a different artist name for each unique set of people you collaborate with.
Former classical singer here. Only theory I can come up with is that opera tends to have large casts where all the singers are credited individually which would inflate the absolute numbers of "artists" relative to other generes. I still struggle to imagine this accounting for bringing such a niche genera to the top here.
> Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?
My guess is just the same opera performed by a ton of different orchestras, and perhaps the same orchestra for different recordings, times however many operas there are.
Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis is it legal or illegal?
Monopoly is not a nice thing. Maybe it is convenient, but not nice.
People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).
Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.
All torrent clients must necessarily support partial downloads because of the nature of torrents. The files are split into pieces which are downloaded and then assembled by the torrent client.
great. Spotify just removes things all the time (things I actively listen to and work on for my jazz practices, one day just go "poof" because they didn't want to pay the record company anymore), and they are not as a company deserving of the role of "keeper of all the world's music". They don't give a shit and they'd vastly prefer we all listen to their AI generated royalty free crap and Joe Rogan.
Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.
I mean... not really? Not much music is Spotify exclusive (at least from the 99.6% of what people listen to mentioned in the article), and from friends in the industry I can guarantee you all major content platforms (Netflix, Disney+, Prime Video, a large chunk of YouTube) have already been completely copied without a business agreement with the rightsholders by AI startups and big-name players.
When I left my apartment back in 2018, I was switching the Comcast account over to my housemate who was staying on there. In doing so I discovered I had a myname2342@comcast.com email account. The UI showed something like 8,000 unread emails. Bemused, I opened it to see what kind of spam it had accumulated. None at all! It was just under 8,000 DMCA / torrent warning emails from Comcast itself. "We know you torrented The.Pokemon.Movie.2001.h264.mkv, you better stop that!"
A full year of these emails and nothing more than that ever happened.
(if you're wondering how I hit 8000 torrents, the answer is individual album torrents)
This reinforces my belief that this effort ("anna's...") is financially backed by Russia/Putin. The HN crowd probably won't see it though.
Think from a geopolitical perspective, not (just) a "copyright shouldn't exist" perspective. They claim "communism" as a motivation; Putin is looking to re-establish the Stalin Soviet Union.
> Why would you want to destroy your enemies' industries, is what you're asking?
Do you have any evidence that pirating is destroying industries? My guess is I can find the majority of this release by anna's archive on some combination of the pirate bay and the soulseek, or private music trackers. And yet, Spotify is still a thriving company, as is the entire music industry as a whole. There's even room for competing streaming services like Tidal and Youtube Music.
Am I understanding this wrong? Ripping the metadata I'm fine with. But it sounds like they've ripped every song from Spotify and they're going to release them?
Edit: It seems like they are. Stealing from tens of thousands of artists, big and small, and calling it "preservation" or "archiving" is scummy.
The people I know who go through the trouble of pirating and downloading vast libraries of music are all musicians themselves, or at the very least total music nerds. They don’t want to lose access to their stuff, plus if they ever need to import audio into a DAW, DRM is a no-go. They are the same people who spend large amounts of money on vinyls, and support smaller independent artists through concerts, merch and (back in the day) CDs.
It used to be more mixed, but today, piracy is often the only option to ”own” any media at all.
It's both. Musicians and music nerds buy CDs and LPs and tapes and Bandcamp files and they "pirate" music both because they care about ownership and quality and rare or substantially different editions of records that aren't available legally, and because they've seen the sausage factory from the inside and know that "stealing" $0.02 from an artist who's starving like them anyway isn't really that far up on the list of heinous crimes. Buy the shirt, download the album. No one cares.
Music piracy is already a thing, not to mention you don't even need to torrent nowadays when music is available for free on YouTube. Those who don't want to pay already don't pay so nothing changes there.
The value of Spotify is the convenience, and this collection does not change that in any way. Your argument would apply if someone were to make a Spotify clone with the same UX using this data.
At least pirates provide some value from curation usually. In this case the leak is just all of Spotify. It makes it really easy for a competitor to just duplicate the Spotify service without paying licensing fees. Tbd what happens.
Because it's not stealing. Stealing is a problem because it deprives the original owner of the item - whether the thief subsequently uses the item or not doesn't change that.
This doesn't apply to dematerialized content: the original copy still exists. The only negative impact occurs if someone decides to actually use the pirated copy in place of buying a licensed one.
The mere existence of this new pirate copy being around doesn't automatically imply that, especially if other, more convenient sources are available.
Okay, call it copyright infringement then if you want to be a stickler on definitions. It's still wrong and existing instances of it doesn't make it justifiable to do.
Spotify can shut down any day. Even if it survives, it's removing content all the time. How are future generations supposed to study and listen to music if it is lost? Imho, someone has to do it.
Why is this stealing? You can already listen to everything that's on Spotify with a free account. You are free to also record the audio while it's playing. I suppose grabbing the actual file should't matter? Or is this about releasing? And robbing people of plays they would otherwise get through Spotify?
If you listen to something on Spotify with a free account the artists still get paid. This isn't a case where you're ripping off so mega-corp. You're ripping off thousands of artists from major label ones to tiny indies. Take the metadata and build something cool. Stealing the files and releasing them is something else entirely.
You can record what you play from Spotify and you are already free to play the record again and again and again without the artist being paid.
Most people do not because they find it less convenient than paying 20bucks a month or whatever is the current price in 2025 but that doesn't change the reality.
For most people the appeal of Spotify is not the music itself but the playlists that are shared thanks to its ubiquity. This is the reason other services struggle to make a dent even if they have better quality, UI and algos.
Spotify started by disrupting the market using pirated music by the way so you are pretty much endorsing and encouraging piracy when "paying" your favorite artists through Spotify.
> What’s actually scummy is Spotify paying artists $1 per 1000 streams.
My spotify wrapped says I listened for 50,000 minutes this year. Assuming 2 minutes per song, that's 25,000 streams. I paid them $110, aka $0.004/stream. Assuming I'm a typical user, they obviously could not afford to pay any more than that per stream.
I googled "spotify pay per listen" and the first result is a reddit comment saying "The average payout on Spotify is only $0.004 per stream." The google AI overview says "Spotify [..] pays artists a fraction of a cent, typically $0.003 to $0.005 per stream". So I'll assume it's something in that ballpark.
So it seems like Spotify's payouts are completely reasonable, given their pricing. Is my logic wrong somewhere?
That’s a fun math. I just checked mine: 96000 minutes. 2 minutes per song is way too generous as an assumption, for me everything seems to be > 3 minutes so ~20000 streams.
I’m paying for a family account (that’s around 250/year) and there are 5 people on it so my usage is 1/5th of that (50/year)
So that’s 0.0025€ per stream. I don’t think your assumption is unreasonable.
In most cases, they couldn't make that decision even if they wanted to. Only independent artists and those that are so large as to have enough sway (Niel Young for example) would be able to. The vast majority of artists you probably listen to don't actually own the rights to their own music.
So let the rights holders make the decision? They would never. Music rights exist for them to extract profit above all else. They don't care about preserving culture or legacy. Which is why it's important that somebody does.
While I wouldn't call this scummy I do agree with your sentiment. It is technically stealing and those copyrights should be respected.
Full disclosure, I am a career musician AND have been known to pirate material. That said, I think this is a valuable archive to build. There are a lot of recordings that will not endure without some kind of archiving. So while it's not a perfect solution, I do think it has an important role to play in preservation for future generations.
Perhaps it's best to have a light barrier to entry. Something like "Yes, you can listen to these records, but it should be in the spirit of requesting the material for review, and not just as a no-pay alternative to listening on Spotify." Give it just enough friction where people would rather pay the $12/month to use a streaming service.
Also, it's not like streaming services are a lucrative source of income for most artists. I expect the small amount of revenue lost to listeners of Anna's Archive are just (fractions of) a penny in the bucket of any income that a serious artist would stand to make.
It is technically not. Stealing means you have a thing, I steal it, now I have the thing and you do not. You can’t steal a copyright (aside from something like breaking into your stuff and stealing the proof that you hold the copyright), and then a song is downloaded the original copyright holder still have copy.
Calling piracy theft was MPAA/RIAA propaganda. Now people say that piracy is theft without ever even questioning it, so it was quite successful.
See my other comment. Identity theft is the bank being defrauded and passing the problem onto you. They are the victim, not you and it is their money that’s gone, not yours.
IP theft is more like espionage and possibly lost hypothetical revenue. Again, it isn’t larceny, burglary, etc. You still have the knowledge, it’s just that so does the perpetrator.
Moreover discussions of IP gets into whether it even makes sense to be able to patent algorithms which are at their core just mathematics. So before you can talk about stealing the quadratic formula you need to prove that the quadratic formula is something that can be property.
You may not be stealing the actual content, more so “making a copy”, but in doing that you’re taking away money the artist would have earned if you bought their album or streamed it on Spotify (admittedly that’a a very small amount for the artist but that’s another thing)
And if I stole something physical you had for sale, you wouldn’t make the money, so the end result is effectively the same.
The “if you bought their album” is the non-trivial part of that sentence. A pirate is not necessarily going to fork over $20 for an album if they couldn’t pirate. Chances are they will simply not buy the album. In either case the artist doesn’t get their $1.20 (6% to the artist the rest to the studio and distributors). So the result is really not the same because the artist and the pirate can both have the album in different ways and in both cases the artist doesn’t get their $1.20 unlike a physical good which cannot be cloned.
What this really is exposing is that most art is not worth the same. A Taylor Swift album is not worth the same on the open market as a Joe Exotic album. Pricing both at say $20 is artificial. Realistically most music has near zero actual value, hence why if you are a B tier or lower artist you won’t make much compared to an A tier artist on platforms like Spotify or YouTube which pay per listen/watch.
Can you post your social security number and other personal info here then? You will still have it afterwards!
Oh also, I don't see why I should ever pay for trains or movie tickets if there are seats available. I can just walk in! The event will happen anyway. Its not stealing.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
What the music/movie industry was claiming in court was not theft. There is no statute that identifies piracy as theft. They were claiming copyright violation and wanted to collect damages for lost revenue.
You are bringing up “identity theft” which is also not theft. If you post your PII here and I use it to open a credit card in your name and then spend a bunch of the money using that card on buying goods and services, you are not the victim. What I do in that case is defraud the bank. They are the ones who are the actual victim and in the ideal world they would be the ones working with the authorities to get their money back.
Of course they would rather not do that so they invented a crime called identity theft and convinced everyone that it is ok for them to make you the victim. They make your life hell since they can’t find the actual criminal while you spend thousands of dollars trying to prove that you don’t owe thousands of dollars. But in reality you were not any part of the fraud. It is on the bank to secure their system enough to prevent this. But they have big time lawyer money and you don’t so here you are.
Ageee with you, this release is obviously a scummy thing to do.
Same as if someone released every book on Kindle for free. There are rules. Project Gutenberg is great. They don't just steal every book they can.
Not to mention the organization is openly trying to profit from this data by selling it to big tech orgs for AI training! None of the artists consented to that, I am sure, to say nothing if Spotify's interests.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
Yuck. Just to make it easier to train slop machines. The point of art is not to have completionist archives of EVERYthing that’s ever been made! Let it die. Death is the most natural part of life. Art is about the human experience, not “for researchers”.
The point is human connection. Art is a living reflection and record of human experience.
Art will persevere- the kinds of folks who prioritize what they like based on popularity were never the supporters artists (contrast with craftspeople trying to make a buck) counted on in the first place. Enjoy your derivative slop - we’ll continue on our imperfect, messy, individual, human artistic lives.
I am having a lot of trouble following you. Something has upset you: what would make you feel better?
do you mean that researchers should be disallowed from accessing art?
I do not see how research interferes with all the benefits you prioritise. Can't you continue to enjoy those benefits?
Many people think 'real' music has electric guitars. I think they're wrong, but why argue with them? I think it's fine if you do not like music made from music, but that ship sailed last century. One detail you may be missing is that there are imperfect messy individual artistic humans who make music from music too. Computers are no more an obstacle to human connection through music than electric guitars are.
> I am having a lot of trouble following you. Something has upset you: what would make you feel better?
Don't talk to people like here, please. It's passive aggressive and unproductive. GP's comment was fine, if not a bit impassioned, regardless if you agree with it.
This is insane.
I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?
> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.
The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI labs piracy-maxxing.
You go where the money is. Infra isn’t free. Churches pass the plate every Sunday. Perhaps one day we’ll exist in a more optimal socioeconomic system; until then, you do what you have to do to accomplish your goals (in this context, archivists and digital preservation).
> Infra isn’t free.
There is a certain irony in people providing copyrighted works for free justifying profiting from these copyrights on the basis that providing the works to others isn’t free.
I'd have a lot more sympathy if the music industry didn't try all of the worst available options to handle piracy for years and years.
They had many opportunities to get out ahead of it, and they squandered it trying to cling to album sales where 11/13 tracks were trash. They are in a bed of their own making.
You have been able to buy DRM free digital music from all of the record labels since 2009 from Apple and other stores.
You've been able to buy DRM free digital music since the 1980s.
> DRM free digital music from all of the record labels
Is this true? Can you show me where I can get DRM-free releases from Mountain Fever?
Better yet, can you add that information here? https://pickipedia.xyz/wiki/DRM-free
Your link doesn’t work. But I assume you are talking about this label? I looked at the first artist and I found the artist’s music on iTunes. Everything that Apple sells on the iTunes Music Store has been DRM free AAC or ALAC (Apple lossless) since 2009.
https://mountainfever.com/colin-kathleen-ray/
While ALAC is an Apple proprietary format, it is DRM free and can be converted to FLAC using ffmeg. AAC is not an Apple format
ALAC is open source and royalty free since 2011. https://macosforge.github.io/alac/
Wow. How did I miss that!!!
The "iTunes going DRM free" was a big deal around 2008.
https://web.archive.org/web/20070207234839/http://www.apple....
https://www.theguardian.com/technology/2008/may/15/drm.apple
I don’t know about Mountain Fever, but for anything I haven’t been able to find on Bandcamp, I’ve been able to find on Qobuz.
they made cd singles and single song purchases long before streaming
Cost recovery isn’t profit. Copyright is just a shared delusion, like most laws. They’re just bits on a disk we’re told are special for ~100 years (or whatever the copyright lockup length is in your jurisdiction), after which they’re no longer special (having entered the public domain).
I think what is more ironic is we somehow were comfortable being collectively conditioned (manufactured consent?) with the idea that you could lock up culture for 100 years or more just to enable maximum economic extraction from the concept of “intellectual property” and that to evade such insanity is wrong in some way. “You can just do things” after all.
It's not the bits that are copyrighted, it's the performance and the creative work.
Your savings account is just bits on a disk, yet presumably it represents value that you worked for and which belongs to you to do with what you wish.
> Your savings account is just bits on a disk, yet presumably it represents value that you worked for and which belongs to you to do with what you wish.
That's another example of the shared delusion, since yes, we tell eachother it represents labor and resources, and the market engages in allocation somewhat efficiently, and so the money is a pretty accurate representation of the value of labor and the value of resources.
In reality, that's not true, because the most highly compensated jobs are some of the least valuable, such as investment bankers, landlords, or being born rich (which isn't even a job, but is compensated anyway). Rent seeking is one of the most highly compensated things you can do under this system, but also one of the most parasitic and least valuable things.
Your savings account's number is totally detached from accurately representing value. It's mostly a representation of where you were born.
Everyone is doing it, who Cates anymore. Genie's out of the bottle, we could've tried to solve this for decades and yet we didn't so now we reap what we sowed. Happens, move on.
Do you have evidence they are profiting? I'm genuinely curious how these kinds of archives sustain themselves.
They take donations.
Just to nitpick, that doesn't imply profit. They could be breaking even (and probably are working at a loss).
Or they know that those parties are going to hammer their servers no matter what so they will at least try and get some money out of it.
That made me chuckle, Enterprise Level Access. I mean as ai company, that’s incredibly cheap and instead of torrenting something, why get it. That price is just a fraction of a engineers salary.
But then you have a money trail connecting the company unambiguously to copyright violations on a scale that is arguably larger than Napster.
I believe they're largely targeting foreign companies who don't care much about US copyright law.
I mean Facebook and Anthropic both torrented LibGen in its entirety.
Yeah,how devstating it would be for Anna's Archive to be found skirting copyright laws. Their reputation may never recover.
\s
He meant the AI companies
I mean, the same comment applies mutatis mutandis.
> I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
Sounds like one of these: https://krebsonsecurity.com/2025/11/is-your-android-tv-strea...
Probably not your problem to play tech support for these people and explain why being part of a botnet is bad, but mildly concerning nonetheless!
Who cares, today is pretty easy to be part of a botnet. Having a slightly outdated lightbulb qualifies, so I'd not bother.
Spotify is $12/month at most to get unlimited ad-free access to virtually all music.
To get access to "all" TV content legally would be hundreds of dollars a month. And for many movies you must buy/rent each individually. And legal TV and movies are much more encumbered by DRM and lock in, limiting the way you can view them. (like many streaming apps removing AirPlay support, or limiting you to 720p in some browsers)
I think Spotify wins over pirating because of its relatively low cost and convenience. Pirating TV/Movies have increased as the cost to access them has.
> The Anna’s archive group is ideologically motivated.
Very interesting, thank you. So using this for AI will just be a side effect.
And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.
I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.
Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.
> The Anna’s archive group is ideologically motivated.
Anna’s archive business is stealing copyrighted content and selling access to it. It's not ideologically motivated.
What ideology is about pirating books and music where most of the people producing this stuff cannot afford to do it full-time? It's not like pirating movies, software and large videogame studios, which is still piracy, but they also make big money and they don't act all the time in the interests of the users.
Writers and musicians are mostly broken. If we sum the rising cost of living, AI generated content and piracy, there's almost no reward left for their work. Anna’s archive is contributing to the art and culture decadence. They sell you premium bandwidth for downloading and training your AIs on copyrighted content, so soon we can all generate more and more slop.
Agreed. I see far too many people rationalizing piracy as a principled thing to do. Instead of finding ways to improve the market such that the control of content isn't siloed in monopolistic corporations, many celebrate Annas Archive which is itself a more or less monopolistic profit-interested entity. The major difference being that we don't have to pay directly. The cost continues to fall on the writers and artists and the industry suffers.
> Instead of finding ways to improve the market such that the control of content isn't siloed in monopolistic corporations
I always assumed the "Anna" in the name was for "Anarchist." My assumption about the archive is that they don't believe there's an ethical solution to the restriction of access to data that involves a capitalist market.
> I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand.
It may be relevant for those people, but I lost all interest in current TV or streaming stuff. I just watch youtube regularly. What's on is on; what is not on is not really important to me. My biggest problem is lack of time anyway, so I try to reduce the time investment if possible, which is one huge reason why I have zero subscriptions. I just could not keep up with them.
They’re doing it for everyone, so, yes, they are doing it for AI companies.
The metadata is probably more useful than the music files themselves arguably
Especially since they scraped Spotify's popularity rating as well
I can't think of many situations where that would be particularly valuable, considering it favours recent plays and the cutoff date is already almost half a year old.
Helps train an algorithm to figure out which music is popular, as a training signal
If that's all the issues there are with the dataset, it is probably far and away the best dataset any researcher has ever used.
> this doesn't even seem particularly useful for average consumers/listeners
I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.
This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.
Which is how Spotify started... And is still carrying on. So nothing has changed.
I think they build the demo with pirated music, but it was licensed by the time customers started paying for it.
Correct, the pirated music library was before they exited the closed Alpha.
Spotify pays 70% of revenue to rights holders.
Why don't you ask them where the money inteded for artists is going? You know? The small insignificant companies of Sony, Warner Music, EMI that own the vast majority of music and own all the contracts?
I just started DJing and something I quickly noticed is how garbage Spotify's music sounds compared to FLACs I've purchased. The max bitrate is very low.
Spotify just (last week or 2 weeks ago) introduced lossless compression (FLAC) and it sounds amazing.
tidal is a thing and can be scraped the same way. I wonder how big that collection would be as it can go from 50mb to 300mb for 3min
Spotify fucks over most artists anyway, so who cares?
Spotify pays the rightsholders. What are they supposed to do about the shitty contracts that the artists signs with the labels?
yeah it's wild to me how folks will defend the current status quo when it's clearly broken.
people defend convenience way too much. spotify isn't good for us and spotify-like-streaming is destroying the music industry.
this argument is so tired.
most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
most musicians just want to make music, express themselves, and connect with folks who enjoy their stuff or want to make music with em.
Even some of the largest artists in the world only receive a few grand a year from streaming. Only the top 1% or so of artists get enough streams to even come close to living off it. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.
youtube is kind of the same way. the real money comes from sponsorships which come from engagement. nobody on youtube is upset that their video got stolen because that mentality was never sold to us to justify screwing us over. musicians, however, were used as pawns so music labels could get more money.
now folks will say stuff like "this is theft" which is just a roundabout way of supporting labels who steal from the artists. so, it's just a weird gaslighting. there's a reason folks turned on metallica over the napster stuff. metallica were being used to further the interests of labels over the interests of fans. and now you're doing the same thing :) It's a script we hear over and over again yet people keep falling for it.
> most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
I think you have it the wrong way round. I'm sure that musicians would love to make money from album / song sales. It's just that between piracy and companies like Spotify, artists make pennies on these activities, so their only choice is to make money on more labor-intensive stuff where they retain more control.
Note that Spotify, somehow, finds it profitable to be in the streaming business.
I think it was was Les Claypool (of the band Primus) who said on some podcast that recording a studio album with its attendant very non-trivial costs is really just creating a very expensive business card to hand out to prospective clients.
Back then, that is. It probably cost $250k in 1990 for them to record Frizzle Fry in a studio, handwave $500k in 2025 dollars. But Bandcamp on MacBook and some gear from GuitarStudio, round to $15k and your time. neither of which isn't trivial or cheap, but it's not 1990 no more.
> I'm sure that musicians would love to make money from album / song sales.
i think we're actually in agreement. I just don't see streaming as a "must". A lot of musicians I work with and follow also don't see streaming as a must. It's a necessary evil in today's convenience fixated life/culture.
Most musicians I ask about this absolutely fucking hate streaming and don't view it as a real revenue stream.
That's why nearly all merch tables still have CDs, bandcamp links or records for purchase. Artists make more money off a t-shirt sale than they do from 50,000 streams.
I think you slightly misinterpreted what I meant by "selling their music". Or I might have said it poorly.
also, piracy does not mean less money for small artists. evidence suggests the opposite, i think. I think piracy marginally harms record sales for the top 1% of artists while benefiting basically all other artists.
piracy = free exposure. more exposure means more ticket sales, more merch sales, etc. most musicians i know just want people to hear their stuff. piracy enables that for the majority of folks who can't afford to buy every album. i think artists care more about their art being used in commercial stuff without permission/payment, not everyday people checking their shit out.
Spotify paid out ten billion dollars to artists in 2024. This is not small potatoes - total 2024 music industry merchandise sales was around $14b.
Youtube also paid out literally 50x more to creators in 2024 than Patreon had total subscriptions on the platform.
These big platform payouts matter a lot.
> This is not small potatoes
Unless you're a small potato. Approximately 0% of what I pay for spotify goes to the artists I actually listen to. Fucking Taylor Swift and the Beatles estate don't need my money.
As a reasonably known but not super popular bluegrass artist, I agree: please steal my music instead of paying Spotify for it.
Some quick Googling shows 1 million streams pays approx $2000.
You'd need 40,000,000 streams to earn $80,000.
be aware that payout rates change based on tiers and a bunch of other factors. So, it would likely take more than 40 million streams to earn $80k.
I believe Weird Al posted his streaming revenue a few years ago. He had something like 80 million streams and said he earned about $12. https://www.billboard.com/music/pop/weird-al-yankovic-wrappe...
There is a reason people like T Swift and whatnot tour constantly, it's how they make money. Weird Al is known for his amazing live shows, there's a reason for it: they make more money.
When he says "so if I'm doing the math right that means I earned $12" I interpret that as him exaggerating for effect. It's definitely not him citing the pay slip.
"$2 or more per thousand streams, split across rightsholders" seems like an accurate estimate.
That seems reasonable?
Assume an artist (either directly or through a rights holder) makes 1/3 income from streaming, 1/3 from merch and physical albums, and 1/3 from live events.
40m streams per year would be 800k per week. 200k fans worldwide playing 4 times per week on average could get you there. Thats like a decent sized but not enormous youtube channel.
200k fans worldwide would also support the ticket sales and merchandise sales aspects.
You only need 5000 fans to buy your CD/album/w.e at $15 to make 80k
Per year, which is a big lift compared to them pressing play on Spotify
99% of that 10 billion went to a handful of artists. Actually, I'd wager nearly half of it went to labels and other middlemen, but that's beside the point. The vast majority of money in the music industry never trickles down, ever.
edit: I looked it up, 70% of spotify's payouts go directly to labels, not artists. So...that $10 bil is nothing.
This is by design and it's the same broken system that metallica defended in the 90s/00s because it benefits large artists while fucking over the other 99%.
We keep repeating the same script using the same busted short term logic.
Labels suck but when we're considering the merits of Spotify it's not their fault and artists can put music on the service without an abusive label.
Ah so you're only stealing a bit of money from the artists. That's ok then.
Touring makes almost no money. Only concerts with >1000ppl make money. Below that you can assume not even the sound engineer gets paid.
Not true at all. I support small artists and it's the only way they make money. Ticket sales and merch make up the vast majority of artist revenue for artists who arent in the top 1%. Most musicians don't make money if they aren't touring or selling merch somehow.
there's also the invaluable aspect of networking that touring allows. bit of a tangent, but it's very important for musicians to network.
The exception are musicians who do production stuff. Think movie/tv scores, commercials, etc. I actually know a handful of artists who used to tour quite a lot but eventually settled down to do production stuff. So they transitioned from touring to make money to production. Touring all year with no healthcare catches up to people.
I know a number of musicians that tour nightclubs, small venues, and festivals.
They make a living; not a luxurious one, but they do OK. They just enjoy making music, and feel that it's worth it. Many of them never even record their music.
>The thing is, this doesn't even seem particularly useful for average consumer
it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?
the average consumer will flock when Spotify ultimately enshitifies
Netflix didn't lose content by choice. Actual right holders decided to pull their content and create rival services.
Has nothing to do with perceived enshittification by Netflix (even though they have enshittification too).
Spotify is under the same threat: they have no content that they own. Everything is licensed.
I thought they started producing their own podcasts. Can't bring in much though.
But, Netflix did lose their content by choice! Way back in the 00s, you could pay Netflix something like $5 a month, and they would mail you physical DVDs of almost any movies you could ever want to watch. In fact, my recollection is that the physical library was generally much more extensive than the streaming library, at least through the early ‘10s.
Sure, they had the rug yanked out from under them with digital streaming, but they very deliberately put themselves into that position when they pivoted to streaming in the first place.
There was never a time that Netflix had the majority of popular movies on their streaming service.
For their mail service they did
>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
What's stopping someone from sticking a microphone next to their speaker?
Slow, but effective.
> Slow, but effective.
I wouldn't call this very effective. It would take an impractically long amount of time to capture a meaningful fraction of the collection and quality would suffer greatly.
Even if you plug the audio output into the input you would still be taking a quality loss by passing the audio through a DAC and then an ADC. Maybe if the quality of your hardware is good enough it wouldn't matter, but then you would be limited to only ripping 24 hours of audio per day...
Audio fingerprinting?
>Audio fingerprinting?
Bought a spotify card with cash, email was registered on public wifi.
Who cares? :-)
They'd probably do a shit job of capturing it?
> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?
I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.
It's probably going to make the AI music generation problem worse anyway...
I would expect more data to make ai music generation better
When they say "worse" they do mean the AI will get better which will be worse because they are ideologically opposed to AI.
Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol
Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?
This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.
What, precisely, is the point you’re trying to make here?
Expressing frustration at the pervasive tendency of technologists to look at everything, including art which is a reflection of peoples' subjective realities, with an "at-scale" lens, e.g., "let's collect ALL of it, and categorize it, and develop technologies to mash it all together and vomit out derivative averages with no compelling humanist point of view"
I hope readers will feel our frustration.
Well, that seems like a pretty reasonable thing to be pissed off about, thanks for taking the time to elaborate.
I think the overlap between the bureaucratic technologies developed by people who, by all accounts, are genuine lovers of the subjectivity and messiness of music qua human artistic production (e.g. the algorithmic music recommendation engines of the '00s and early '10s; public databases like discogs and musicbrainz; perhaps even the expansive libraries and curated collections in piracy networks like what.cd), and the people who mainly seem interested in extracting as much profit as possible from the vast portfolios of artistic output they have access to (e.g. all of Spotify's current business practices, pretty much), should probably prompt some serious introspection among any technologists who see themselves in that first category.
I read an essay a number of years back, which raised the point that, if you're an academic or researcher working on computer vision, no matter how pure your motives or tall your ivory tower, what do you expect that research to be used for, if not surveillance systems run by the most evil people imaginable. And, thus, shouldn't you share some of that moral culpability? I think about that essay a lot these days, especially in relation to topics like this.
How does Spotify defend people who actually make art? There's virtually no difference between pirating and steaming through Spotify for the vast majority of artists.
updated - thank you commenters for making it clear that my sentiment was not clear
Spotify doesn't take care of artists, if you knew any artists you'd understand that Spotify is atrocious for people who make music.
> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
Do they have DRM at all? Youtube and Pandora don't.
Spotify has DRM, and you can find open-source reimplementations of it on github.
Their native clients use a weak hand-rolled DRM scheme (which is where the ogg vorbis files come from), whereas the web player uses Widevine with AAC.
Yes they do use DRM. I know they are using Widevine on the web player, but possibly other ones too (never looked very far). Not sure for the app, it might be that it is using OGG streams with a custom DRM (which is probably the one some existing downloaders actually (ab)use).
It's called playplay. It's used for protecting their new lossless files. But the first rule of playplay is you can't talk about playplay. https://torrentfreak.com/spotify-dismantles-spotifydl-track-...
YouTube Music uses Widevine.
If it's on YouTube Music, it's also on... YouTube.
Not necessarily at the same quality though.
I assume in most cases they're literally the same files. Youtube runs "topic" channels for music that distributors have sent it.
https://www.youtube.com/channel/UCYOa-hi751OKY2zGJJv6V2A
https://www.youtube.com/watch?v=MSSxnv1_J2g (same thing, but on an official channel instead)
You can load any youtube music song on youtube by just removing the "music" subdomain.
Then why do you say they might not be the same files?
Let me start over. Youtube itself has DRM required for certain videos, and certain formats of videos.
The 256 kbps format for music will be protected by DRM. If you do not have DRM available youtube will fallback to a lower quality format to play the auduo.
Music might have higher quality audio-only files as provided where Youtube might have it combined with video and a generic compression algorithm applied as with all other uploaded videos.
Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.
Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.
DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.
>unless you have millions of accounts.
Challenge accepted…
This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.
Notably 160kbit is the free-tier bitrate, so they presumably used unpaid accounts.
>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.
They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.
> They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook [...]
I think it's pretty clear from history that they are too big to have to pay out a huge settlement.
First, they never had to. There was never a "huge" settlement, nothing that actually did hurt.
Second, the US don't do any kind of antitrust, and if a government outside the US tries to fine a US TooBigTech, the US will bully that government (or group of governments) until they give up.
Anthropic had to pay $1.5 billion recently so you're incorrect. I'm sure more of such cases will come up against big tech too.
Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.
> The thing is, this doesn't even seem particularly useful for average consumers/listeners
Yeah. To me it is not really relevant. I actually was not using spotify and if I need to have songs I use ytldp for youtube but even that is becoming increasingly rare. Today's music just doesn't interest me as much and I have the songs I listen to regularly. I do, however had, also listen to music on youtube in the background; in fact, that is now my primary use case for youtube, even surpassing watching movies or anything else. (I do use youtube for getting some news too though; it is so sad that Google controls this.)
To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.
[0] https://en.wikipedia.org/wiki/What.CD
I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.
Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.
> There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown.
Music licensing (in the US at least) is actually pretty nice for this (from the licensee perspective anyway). There are mechanical licenses which allow you to use music for many uses without contracting with the rightsholders and clearinghouses whose job is to determine where to send royalties. So you can use the music and send reporting and royalties to the clearing houses and you're done.
Of course, you may want to contract with the rightsholders if you don't like the terms of the mechanical license; maybe it costs too much, etc. If you're Spotify or similar and you have specific contracts for most of the music, and have to pay mechanical license rates for the tail, it might make sense to do so in order to boast of a larger catalog.
I’m still using the “successor” to what.cd and I usually discover artists through random lists, “related artists”, among other things on the platform.
One interesting way of discovering artists is finding an artist that I already like on a compilation CD, and then seeing what else is on the CD.
Would you share the name of that successor? I miss the old internet and would love to take a look.
It's Redacted.sh, a.k.a. RED. They have around three million torrents. But like What.CD, Redacted.sh is a private tracker, so you can't just jump in and see the content.
Another comment mentioned Redacted.sh as a successor. I haven't used it. I'm sure there's a subreddit around that can help. Looks like orpheus is another option if I'm reading correctly. You have to get an invite or pass an "interview" though, so be prepared to wait a while.
the compilation album is a great idea. thanks for that. your comments in here have been helpful. have fun listening.
Yeah, What.CD had a bunch of the local Brisbane post-rock bands from the 00s on there which was amazing to me. I at least have copies of a lot of their records!
email me please
True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.
This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.
That's why I use YouTube Music as my streamer as they allow damned near anyone to upload any old rare record and then figure out the royalties somehow.
Yeah, it was a great place. I have a paid Spotify account but finally got an ancient hard drive onto my network for all sorts of stuff Spotify doesn’t or can’t have (e.g., Coldcut: 70 Minutes of Madness).
Yes. RIP a ton of very rare material. What.cd has a special place in my heart.
Redacted.sh is a worthy successor, but the average person just doesn’t care about “which release is best” anymore. I use YT Music as a backup but Redacted is my main source of music these days.
Don't you consider it best to ... redact ... your post, as it's the only one mentioning it by name?
Some people just don't know when to shut the hell up.
At the end of the day it feels like the private trackers are such a nightmare to get invited to and maintain ratio at it’s just not worth the effort.
I want this torrent though. It would be fun to stand up a NAS for this.
You can’t talk about what.cd without talking about its precursor OiNks Pink Palace. Even Trent Reznor was public about what an amazing place it was. Music aside, the community existing just for the shared love of music and not for any other kind of monetary or influencer gain is what set it apart. We just don’t have those kinds of communities for music online anymore
>We just don’t have those kinds of communities for music online anymore
They're still kind of around, but yeah, everything is very much on it's way out in the music scene, at least in terms of that late 90s early 00s culture. Or has been until recently. There is a renewed interest in self-hosting and "offline" style music collections.
It sucks too. The way folks discover music is important. The convenience of streaming has lead to some interesting outcomes. When self-hosting music comes up this is always one of the top questions people have: How do you find new music?
The answer isn't that hard and really hasn't changed much. People just don't want to spend any time or effort doing it. Music stores still exist, they're amazing. Lots of 2nd hand stores carry vinyl and CDs now, which can give you great ideas for new music. There are self-hosted AI solutions and tools. Last.fm and Scrobbling are still very much around. My scrobble history is so insanely useful. There are music discords. Friends. Asking people what they're listening to in public. Live shows with unique openers(I once went to a Ben Kweller show with 4 opening bands, I still listen to 3 of them.)
I mean, WCD has two healthy replacements, plus slsk
I love that SoulSeek still exists in some format. My path was Napster (made me get cable Internet and a cd burner) > AudioGalaxy (learned how to path things on routers so I could download music to home from work) > SoulSeek. Plus it had some useful chat and people who cared about sound quality and metadata.
Soulseek has to be the best kept secret on the internet. Even people my age who grew up with things like Napster, Limewire, and even soulseek, don’t know that it still exists.
The amount of extremely obscure music on there is crazy, stuff that exists nowhere else in the internet except maybe google drive links.
That being sad, I have a lot of non-mainstream tracks in my playlists on YouTube Music that have YouTube comments along the line of “I wish this was available on Spotify :’(“. I bet the same goes for What.CD.
So there’s some way to go for a comprehensive music archive.
Redacted, their replacement has more records then they had now.
Well, what.cd counted any album as one torrent. While current spotify has also podcasts and AI slop.
Incredible.
> A while ago, we discovered a way to scrape Spotify at scale.
They wont and shouldn’t divulge the details, but I imagine that would be a fun read!
It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:
https://codeberg.org/raphson/music-server/src/branch/main/sp...
Where the magic actually happens: https://github.com/librespot-org/librespot
How they manage to transfer 300TB of data while remaining anonymous is also astonishing.
I would guess this can be hidden under normal music streaming activity? But one would need lots of proxies!
It's hard to imagine anything but physical egress for that kind of volume.
they're probably just using something like https://github.com/nor-dee/spotizerr-spotify
No way, that would take far too long.
Probably not, those tools don't actually download Spotify tracks at source quality.
There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.
Sadly since zspotify was killed I don't know of any remaining tools.
votify
This work is so critical.
Read an article that was published just 10 years ago, and witness the bit rot as most external links will 404, gone forever.
I think it's worth questioning the value of preserving -everything-, but it seems like if we can, we should.
I just found out that https://annas-archive.li/ is masked by my German internet provider (SIM.de/Drillisch). I usually use a VPN but I had it switched off temp. to watch Fallout (Prime Video won't let you watch through a VPN). Only when I switched Mullvad back on could I open the site.
I didn't know German providers do this.
In that vein, I am trying to find out why searching for
which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, BingThey even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?
They have marked the repo as noindex (or GitHub is forcing a noindex header).
Its returning a noindex flag so every serp is correctly doing what the repo has been asked.
That is... except for brave! I checked on my searx instance and it still showed up in brave's results
Very interesting. The security page does show up on kagi at #6.
I wonder if GitHub flags it to not be indexed or something.
Was also shocked to see that (Berlin, Telekom here).
I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.
There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.
Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?
This is something really important, especially in the days when music and film vanishes from platforms one by one. I myself have three playlists with greyed out titles (titles are missing so there's no possibility for me to find out what was there).
That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day
Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.
spotify used to do just that (stream p2p) until 2014 or so
https://www.scribd.com/document/56651812/kreitz-spotify-kth1...
The person who wrote this Spotify p2p software also wrote uTorrent, which was bought by the company bittorrent after they struggled to make a C++ client on their own. The original bittorrent implimentation was in python, but they re-skinned uTorrent as bittorrent and shipped both for a few years. https://en.wikipedia.org/wiki/Ludvig_Strigeus
I recently got into the whole homelab *arr stack for things like movies and tv and while I know options exist for music I just don’t see the need yet price-wise. Spotify is still just cheap enough for me to not care enough. We’ll see how long this holds.
That being said it’s no secret Spotify and other streaming services barely pay even popular artists. Artists make money from live shows and merch. The fact that their music is behind a paywall at all could mean they make less money from some lack of exposure.
I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.
for me its the arms trade.
Daniel Ek pours spotify wealth into next gen miltech.
sometimes I worry that I don't know what music means to other people but I am certain that to me it is antithetical to war culture.
Yeah we shouldn’t. But we may.
a la "Popcorn Time."
Hmm. This is actually not really something I need, I think; but I consider anna's archive etc... as about as important as the internet web archive. We need to preserve data, at the least important data, also historic data - how the original websites looked. Creativity of past generations. Same for games and books.
It may be only ~30 years for webpages to have emerged, but there are also many young people who may not have experienced that since they are too young to have experienced it. There is always a generational change; our generation has the opportunity to store more things.
Since the article asks:
> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!
As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.
The other alternative is algorithmically created music.
I've heard 2:00 is some kinda sweet spot for the Spotify algorithm and payouts? You get paid per play so you don't want to it too long, but if your track is much shorter than two minutes you get penalized or something. I know they've had to remove ambient tracks that were cut into 40 second clips as part of this.
So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?
Hmmm I don’t like this. There are sources for music with better quality out there and all this will do is paint them a bigger target for takedowns/prosecution. I am worried about losing their ebook library. Quoting from the announcement: “Generally speaking, music is already fairly well preserved.“ They should have done this as a separate identity.
I hope someone builds an open API around this metadata. I'd love to have alternatives to the big player APIs.
Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.
This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm
Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify
Yeah, in the article they write:
The data will be released in different stages on our Torrents page:
[X] Metadata (Dec 2025)
[ ] Music files (releasing in order of popularity)
[ ] Additional file metadata (torrent paths and checksums)
[ ] Album art
[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)
Oh I see, thanks! I missed that
I wonder if Spotify will pursue any legal actions to take this archive or the site down!
I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?
Site is down for me. Archive link: https://archive.is/jf3HW
Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.
Yes, blocked. This is what I see in germany without a VPN
https://notice.cuii.info/
"Their buisness model is based on copyright infringement"
Well, where to complain that Anna's Archive ain't a buisness?
Ironic. But its working for me.
This is one of the greatest news I've ever heard for the digital preservation community. Just so many projects over the years could have used resources like this. Thank you for contributing to humankind!
Amazing! I wonder if the Every Noise At Once[1] site could be updated with the metadata from this?
[1] https://everynoise.com/
This is incredible. I once assembled a collection of 100,000 tracks for research on exploration of large music libraries. Essentially vector search. I was limited in storage and processing power to a single machine.
If I were to do it today, I could get so much farther with hyperscaler products and this dataset.
TIL Anna's Archive is blocked in Germany (by a rather obtrusive MitM, I might add). Get redirected to a "Copyright Clearing House" or something.
I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.
It seems to be that the metadata doesn't include the lyrics, probably because they are provided by Musixmatch. It would have been nice to have a database of lyrics linked to ISRCs. AFAIK Lrclib doesn't support downloading lyrics for a given ISRC.
Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.
Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.
Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point
> Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale)
For the major scale, there are 7 notes in the scale and only 5 black keys; you also need to skip ti, the 7th note.
For the minor scale ("C#m"), it's worse; only four of the five black keys are part of that scale.
And I would have thought that something intended to be played only on the black keys would be described as using a pentatonic scale anyway?
As a belated followup, I should observe that if you're playing "in C sharp minor" on the black keys, you're skipping notes 3, 6, and 7 of the scale... and those are the only notes that differ between a minor scale and a major scale, making the "minor" designation completely meaningless.
Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.
There is a sweet spot for the bass. Lower is better for deep bass, but too low and it stops being a recognizable note, and consumer speakers can't reproduce it. This effect exists though I'm not sure if it is the cause of the pattern here.
Difficult to play in what instrument?
C# I don’t believe was/is a common tuning for most western instruments, classical or modern.
A digital piano can transpose things to make it “easier” to play.
Cursory google search says that a sitar is traditionally tuned to something useful for c#
I’m curious if C# is one of those notes that lines up nicely with whatever crappy consumer stereos/subs were capable of reasonable reproducing in the 90s as electronic music was taking off and it stuck around as a tribal knowledge for getting more “oomph” out of your tracks.
I play piano and don’t mind playing in Db at all. The chords fit nicely in the hands
Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.
YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.
I had to chuck Youtube Music away when it was polluting my youtube playlists with stuff I was liking on youtube music. Me as a video viewer and me as a music listener are two completely different people.
and you can upload 100,000 of your own tracks to the service for your private use as well. It is a great service considering I am getting it as a side effect of youtube premium. Single handedly the last subscription I would cancel.
I always find these takes curious because they could not be further from my experience. I'm still discovering tons of good music. Perhaps it's specific to genres, but I haven't encountered any generated junk tracks.
Really? How about asking google to "play bloomberg news on spotify" next time. Then see if you can remove the resulting chaos from your history so it won't start feeding you slop.
This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.
Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.
I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.
Merry Christmas!
> ≥70% of songs are ones almost no one ever listens to (stream count < 1000).
So much interesting but undiscovered music is out there!
It would be interesting to find out how that has changed with the growth of the music industry over the years. I suspect that many of these <1000 streamed could be artificially generated for monetary purposes but I'm not entirely sure. That being said, there is a lot of good music with less than 1000 streams. I've been looking myslef and I've definitely found some hidden gems.
I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.
This works nicely: https://github.com/spotDL/spotify-downloader
https://developer.spotify.com/documentation/web-api/referenc...
https://developer.spotify.com/documentation/web-api/referenc...
I bet you can whip up a super simple script with an LLM to do this!
Not that using the Spotify API directly is all that hard but the spotipy library makes it even easier.
This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.
I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)
Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?
But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.
BitTorrent protocol doesn’t force you to download all of the files of a torrent :)
Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)
You can download torrents selectively. I think if they adopted that cautious attitude they wouldn't exist in the first place
Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.
Oh, I was somehow under impression that libgen is no more. Glad to see it's not. I guess it was just a different domain.
think popcorn time for mp3s/flac instead of mp4.
a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.
it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.
I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.
I am in no way saying that this is cheap but 300 TB will set you back a little less than $6k with tax. Very attainable for people other than OpenAI and Facebook. And it's not crazy at all to snag a server with enough bays to house all those.
For reference, considering you can purchase a 12-month Spotify Premium subscription via a $99 gift card at the moment, that same $6k could be used for 60 years of Spotify Premium.
For reference, cosidering the backup has 86 million music files, at an average of 3 minutes per file it would take you around 490 years to listen to all the tracks.
I have a Supermicro 24 bay 2U in my house with an array around half that size in it. It’s not prohibitive.
Very interesting that a white noise track for babies is the 4th most popular track on Spotify.
Interesting if that is considered to be copyrightable. Any white noise track is perceptually indistinguishable from another, but none have the exact same sequence of samples except by chance, or if the noise generator happens to be deterministic as a function of time.
I find it so odd that people then to streaming services for stuff like this. I have a dedicated white noise machine, and when I travel, I use the white noise (bright noise actually) built into the iPhone.
Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.
You might find it interesting that there's an entire genre of youtube video that's designed to just be chucked one by one into slideshows for elementary school teachers to use as their lesson plan. Including videos that are just "2 minute timer for kids!"
e.g. https://www.youtube.com/@Ask.the.Teacher
"Independent Reading: Count Up Timer for Classrooms": https://www.youtube.com/watch?v=AfLfJtVeME8 straight up just stock imagery and a timer lol
It's not odd if you aren't the type who frequents hacker news. We are, after all, very much in a bubble here.
Attracting the ire of the music industry seems like a huge, unnecessary risk. I wish they had performed this as some kind of other entity to try to keep the ebook archive protected from the fallout. I fear this will not end well.
I am not enthused by this news. Let us entertain the possibility that similar institutions will eschew this catalog.
I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.
New multimodal training set just dropped.
Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.
Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?
Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?
I mean, I'm sure there's some misclassification, but chamber music is basically a couple people with any sort of music training on classical instruments so that doesn't surprise me nearly as much... I can easily imagine there being _lots_ of those, and you might come up with a different artist name for each unique set of people you collaborate with.
Former classical singer here. Only theory I can come up with is that opera tends to have large casts where all the singers are credited individually which would inflate the absolute numbers of "artists" relative to other generes. I still struggle to imagine this accounting for bringing such a niche genera to the top here.
> Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?
My guess is just the same opera performed by a ton of different orchestras, and perhaps the same orchestra for different recordings, times however many operas there are.
How legal is this with regards to copyright laws?
Not legal. This group does not concern themselves with copyright law.
they do concern themselves with it, but in a "calling it out for being shit" kind of way.
Adherence to the legal framework is a function of your risk appetite.
Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis is it legal or illegal?
Very, if we delete copyright like we're supposed to.
Not legal
Completely illegal.
The metadata scrape might not be.
Pretty sure any kind of scraping violates Spotify’s ToS.
ToS is not law except in the most draconian and authoritarian interpretations of the CFAA.
You are mistaken, it’s contract law.
It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.
Monopoly is not a nice thing. Maybe it is convenient, but not nice.
People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).
There's some irony here considering Spotify used pirated mp3s at the start of their operations, I suppose.
Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.
Are you talking about Spotify here…?
lol is this comedy? Cuz it's absolutely hilarious opposite humor.
You're talking about Spotify, right? Famously started by ad execs pirating music and then selling it.
You must be the Spotify CEO, lol
For some reason, the link does not work for me (spain). Works perfect at the same time in tor browser.
is there a torrent client already that is be good at partial downloads? I didn't realize how popcorn time worked until I read this thread.
All torrent clients must necessarily support partial downloads because of the nature of torrents. The files are split into pieces which are downloaded and then assembled by the torrent client.
199GB, only metadata released for now.
Magnet link found here: https://annas-archive.li/torrents/spotify
Are magnet links allowed on HN?
Is this all regions? I'm assuming so but I can't be sure
Oh this is going to go over real well in Nashville, TN.
I hope they get the new lossless versions
Is there a way to see the shape of the metadata?
great. Spotify just removes things all the time (things I actively listen to and work on for my jazz practices, one day just go "poof" because they didn't want to pay the record company anymore), and they are not as a company deserving of the role of "keeper of all the world's music". They don't give a shit and they'd vastly prefer we all listen to their AI generated royalty free crap and Joe Rogan.
is this not highly illegal?
Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.
Wasn't all data available to users though?
Yes but very hard to scrape in bulk from user accounts
I mean... not really? Not much music is Spotify exclusive (at least from the 99.6% of what people listen to mentioned in the article), and from friends in the industry I can guarantee you all major content platforms (Netflix, Disney+, Prime Video, a large chunk of YouTube) have already been completely copied without a business agreement with the rightsholders by AI startups and big-name players.
I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.
A distributed ripping project to do that would be a fine thing.
Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.
> and a way to download that without my ISP doing something about it.
what would your ISP do?
When I left my apartment back in 2018, I was switching the Comcast account over to my housemate who was staying on there. In doing so I discovered I had a myname2342@comcast.com email account. The UI showed something like 8,000 unread emails. Bemused, I opened it to see what kind of spam it had accumulated. None at all! It was just under 8,000 DMCA / torrent warning emails from Comcast itself. "We know you torrented The.Pokemon.Movie.2001.h264.mkv, you better stop that!"
A full year of these emails and nothing more than that ever happened.
(if you're wondering how I hit 8000 torrents, the answer is individual album torrents)
This reinforces my belief that this effort ("anna's...") is financially backed by Russia/Putin. The HN crowd probably won't see it though.
Think from a geopolitical perspective, not (just) a "copyright shouldn't exist" perspective. They claim "communism" as a motivation; Putin is looking to re-establish the Stalin Soviet Union.
Why... does Putin like music more than the next guy?
Why would you want to destroy your enemies' industries, is what you're asking?
Although I suppose that is predicated on seeing Russia as the enemy. Strangely not always the norm these days in the new world.
> Why would you want to destroy your enemies' industries, is what you're asking?
Do you have any evidence that pirating is destroying industries? My guess is I can find the majority of this release by anna's archive on some combination of the pirate bay and the soulseek, or private music trackers. And yet, Spotify is still a thriving company, as is the entire music industry as a whole. There's even room for competing streaming services like Tidal and Youtube Music.
Am I understanding this wrong? Ripping the metadata I'm fine with. But it sounds like they've ripped every song from Spotify and they're going to release them?
Edit: It seems like they are. Stealing from tens of thousands of artists, big and small, and calling it "preservation" or "archiving" is scummy.
The people I know who go through the trouble of pirating and downloading vast libraries of music are all musicians themselves, or at the very least total music nerds. They don’t want to lose access to their stuff, plus if they ever need to import audio into a DAW, DRM is a no-go. They are the same people who spend large amounts of money on vinyls, and support smaller independent artists through concerts, merch and (back in the day) CDs.
It used to be more mixed, but today, piracy is often the only option to ”own” any media at all.
The musicians I know are the most inclined to actually pay for music (NOT through Spotify) and buy merch.
It's both. Musicians and music nerds buy CDs and LPs and tapes and Bandcamp files and they "pirate" music both because they care about ownership and quality and rare or substantially different editions of records that aren't available legally, and because they've seen the sausage factory from the inside and know that "stealing" $0.02 from an artist who's starving like them anyway isn't really that far up on the list of heinous crimes. Buy the shirt, download the album. No one cares.
Music piracy is already a thing, not to mention you don't even need to torrent nowadays when music is available for free on YouTube. Those who don't want to pay already don't pay so nothing changes there.
The value of Spotify is the convenience, and this collection does not change that in any way. Your argument would apply if someone were to make a Spotify clone with the same UX using this data.
At least pirates provide some value from curation usually. In this case the leak is just all of Spotify. It makes it really easy for a competitor to just duplicate the Spotify service without paying licensing fees. Tbd what happens.
I don’t understand how the parent comment is downvoted yet this is not. “Stealing is ok because stealing is already a thing”… come on, now
Because it's not stealing. Stealing is a problem because it deprives the original owner of the item - whether the thief subsequently uses the item or not doesn't change that.
This doesn't apply to dematerialized content: the original copy still exists. The only negative impact occurs if someone decides to actually use the pirated copy in place of buying a licensed one.
The mere existence of this new pirate copy being around doesn't automatically imply that, especially if other, more convenient sources are available.
Okay, call it copyright infringement then if you want to be a stickler on definitions. It's still wrong and existing instances of it doesn't make it justifiable to do.
Why is copyright infringement wrong?
Copying is not theft.
https://www.youtube.com/watch?v=IeTybKL1pM4
Stealing is not the correct word.
Don't worry, they let Spotify keep the original files.
Spotify can shut down any day. Even if it survives, it's removing content all the time. How are future generations supposed to study and listen to music if it is lost? Imho, someone has to do it.
Why is this stealing? You can already listen to everything that's on Spotify with a free account. You are free to also record the audio while it's playing. I suppose grabbing the actual file should't matter? Or is this about releasing? And robbing people of plays they would otherwise get through Spotify?
> Why is this stealing?
It's not, theft involves taking something from someone, i.e. also depriving them of that thing.
This may be unauthorised copying aka piracy, but it's not theft.
Downloading it all in bulk is different than personal usage. Its like ai companies hoovering up everything.
If you listen to something on Spotify with a free account the artists still get paid. This isn't a case where you're ripping off so mega-corp. You're ripping off thousands of artists from major label ones to tiny indies. Take the metadata and build something cool. Stealing the files and releasing them is something else entirely.
You can record what you play from Spotify and you are already free to play the record again and again and again without the artist being paid.
Most people do not because they find it less convenient than paying 20bucks a month or whatever is the current price in 2025 but that doesn't change the reality.
For most people the appeal of Spotify is not the music itself but the playlists that are shared thanks to its ubiquity. This is the reason other services struggle to make a dent even if they have better quality, UI and algos.
Spotify started by disrupting the market using pirated music by the way so you are pretty much endorsing and encouraging piracy when "paying" your favorite artists through Spotify.
> with a free account the artists still get paid
Unless they're international stars, not really. It's peanuts these days. https://www.reddit.com/r/spotify/comments/13djsl9/how_much_d...
Nobody is gonna download a 300TB torrent just to get the latest Taylor Swift album. There are much easier avenues than that.
What’s actually scummy is Spotify paying artists $1 per 1000 streams.
Buy CDs. Use Bandcamp.
> What’s actually scummy is Spotify paying artists $1 per 1000 streams.
I'm pretty sure it's waaaay lower than that per 1000 streams.
> What’s actually scummy is Spotify paying artists $1 per 1000 streams.
My spotify wrapped says I listened for 50,000 minutes this year. Assuming 2 minutes per song, that's 25,000 streams. I paid them $110, aka $0.004/stream. Assuming I'm a typical user, they obviously could not afford to pay any more than that per stream.
I googled "spotify pay per listen" and the first result is a reddit comment saying "The average payout on Spotify is only $0.004 per stream." The google AI overview says "Spotify [..] pays artists a fraction of a cent, typically $0.003 to $0.005 per stream". So I'll assume it's something in that ballpark.
So it seems like Spotify's payouts are completely reasonable, given their pricing. Is my logic wrong somewhere?
That’s a fun math. I just checked mine: 96000 minutes. 2 minutes per song is way too generous as an assumption, for me everything seems to be > 3 minutes so ~20000 streams.
I’m paying for a family account (that’s around 250/year) and there are 5 people on it so my usage is 1/5th of that (50/year)
So that’s 0.0025€ per stream. I don’t think your assumption is unreasonable.
I suppose it depends on what the mean listening time is. I suspect the kind of person who comments on a discussion about music would listen more.
> Nobody is gonna download a 300TB torrent just to get the latest Taylor Swift album
Well, no. They'll just select the album download it selectively from the torrent.
No but the rip is a perfect tool for bad actors to profit from the music without paying licensing fees
How about we let the individual artists decide?
In most cases, they couldn't make that decision even if they wanted to. Only independent artists and those that are so large as to have enough sway (Niel Young for example) would be able to. The vast majority of artists you probably listen to don't actually own the rights to their own music.
So let the rights holders make the decision? They would never. Music rights exist for them to extract profit above all else. They don't care about preserving culture or legacy. Which is why it's important that somebody does.
Did they get to decide when their music was pirated and sold originally by Daniel Ek?
Spotify used pirated songs initially when they started it. So...
Hey, you should look up how Spotify got started. :)
While I wouldn't call this scummy I do agree with your sentiment. It is technically stealing and those copyrights should be respected.
Full disclosure, I am a career musician AND have been known to pirate material. That said, I think this is a valuable archive to build. There are a lot of recordings that will not endure without some kind of archiving. So while it's not a perfect solution, I do think it has an important role to play in preservation for future generations.
Perhaps it's best to have a light barrier to entry. Something like "Yes, you can listen to these records, but it should be in the spirit of requesting the material for review, and not just as a no-pay alternative to listening on Spotify." Give it just enough friction where people would rather pay the $12/month to use a streaming service.
Also, it's not like streaming services are a lucrative source of income for most artists. I expect the small amount of revenue lost to listeners of Anna's Archive are just (fractions of) a penny in the bucket of any income that a serious artist would stand to make.
> It is technically stealing
It is technically not. Stealing means you have a thing, I steal it, now I have the thing and you do not. You can’t steal a copyright (aside from something like breaking into your stuff and stealing the proof that you hold the copyright), and then a song is downloaded the original copyright holder still have copy.
Calling piracy theft was MPAA/RIAA propaganda. Now people say that piracy is theft without ever even questioning it, so it was quite successful.
> Stealing means you have a thing, I steal it, now I have the thing and you do not.
that seems like an overly narrow definition… what about identity theft, or IP theft?
https://www.justice.gov/usao-ndca/pr/superseding-indictment-...
See my other comment. Identity theft is the bank being defrauded and passing the problem onto you. They are the victim, not you and it is their money that’s gone, not yours.
IP theft is more like espionage and possibly lost hypothetical revenue. Again, it isn’t larceny, burglary, etc. You still have the knowledge, it’s just that so does the perpetrator.
Moreover discussions of IP gets into whether it even makes sense to be able to patent algorithms which are at their core just mathematics. So before you can talk about stealing the quadratic formula you need to prove that the quadratic formula is something that can be property.
Mitchell & Webb's take on "identity theft" is worth a listen.
https://www.youtube.com/watch?v=CS9ptA3Ya9E
You may not be stealing the actual content, more so “making a copy”, but in doing that you’re taking away money the artist would have earned if you bought their album or streamed it on Spotify (admittedly that’a a very small amount for the artist but that’s another thing)
And if I stole something physical you had for sale, you wouldn’t make the money, so the end result is effectively the same.
The “if you bought their album” is the non-trivial part of that sentence. A pirate is not necessarily going to fork over $20 for an album if they couldn’t pirate. Chances are they will simply not buy the album. In either case the artist doesn’t get their $1.20 (6% to the artist the rest to the studio and distributors). So the result is really not the same because the artist and the pirate can both have the album in different ways and in both cases the artist doesn’t get their $1.20 unlike a physical good which cannot be cloned.
What this really is exposing is that most art is not worth the same. A Taylor Swift album is not worth the same on the open market as a Joe Exotic album. Pricing both at say $20 is artificial. Realistically most music has near zero actual value, hence why if you are a B tier or lower artist you won’t make much compared to an A tier artist on platforms like Spotify or YouTube which pay per listen/watch.
Can you post your social security number and other personal info here then? You will still have it afterwards!
Oh also, I don't see why I should ever pay for trains or movie tickets if there are seats available. I can just walk in! The event will happen anyway. Its not stealing.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
https://www.sciencelearn.org.nz/images/straw-man-arguments
What the music/movie industry was claiming in court was not theft. There is no statute that identifies piracy as theft. They were claiming copyright violation and wanted to collect damages for lost revenue.
You are bringing up “identity theft” which is also not theft. If you post your PII here and I use it to open a credit card in your name and then spend a bunch of the money using that card on buying goods and services, you are not the victim. What I do in that case is defraud the bank. They are the ones who are the actual victim and in the ideal world they would be the ones working with the authorities to get their money back.
Of course they would rather not do that so they invented a crime called identity theft and convinced everyone that it is ok for them to make you the victim. They make your life hell since they can’t find the actual criminal while you spend thousands of dollars trying to prove that you don’t owe thousands of dollars. But in reality you were not any part of the fraud. It is on the bank to secure their system enough to prevent this. But they have big time lawyer money and you don’t so here you are.
Ageee with you, this release is obviously a scummy thing to do.
Same as if someone released every book on Kindle for free. There are rules. Project Gutenberg is great. They don't just steal every book they can.
Not to mention the organization is openly trying to profit from this data by selling it to big tech orgs for AI training! None of the artists consented to that, I am sure, to say nothing if Spotify's interests.
On top of that they beg for donations.
You don't think that would be a good thing?
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
Wow. Anna is a godsend. Hopefully now we get some really good open source music models
First we need good stem splitting
What do you think about the recent SAM audio model by meta? https://ai.meta.com/blog/sam-audio/
Is it realtime?
Yuck. Just to make it easier to train slop machines. The point of art is not to have completionist archives of EVERYthing that’s ever been made! Let it die. Death is the most natural part of life. Art is about the human experience, not “for researchers”.
The point is human connection. Art is a living reflection and record of human experience. Art will persevere- the kinds of folks who prioritize what they like based on popularity were never the supporters artists (contrast with craftspeople trying to make a buck) counted on in the first place. Enjoy your derivative slop - we’ll continue on our imperfect, messy, individual, human artistic lives.
I am having a lot of trouble following you. Something has upset you: what would make you feel better?
do you mean that researchers should be disallowed from accessing art?
I do not see how research interferes with all the benefits you prioritise. Can't you continue to enjoy those benefits?
Many people think 'real' music has electric guitars. I think they're wrong, but why argue with them? I think it's fine if you do not like music made from music, but that ship sailed last century. One detail you may be missing is that there are imperfect messy individual artistic humans who make music from music too. Computers are no more an obstacle to human connection through music than electric guitars are.
> I am having a lot of trouble following you. Something has upset you: what would make you feel better?
Don't talk to people like here, please. It's passive aggressive and unproductive. GP's comment was fine, if not a bit impassioned, regardless if you agree with it.
thanks for the correction, I do not want to be aggressive.
I see now I should have just asked: what do you want?
to prefix my response with an admission that I'm not sure what the problem is.