Building a Personal Archive with Hoarder

(brainsteam.co.uk)

152 points | by edward 4 months ago ago

21 comments

bdeshi 4 months ago
btw the hoarder project is an active victim of a patent troll[0][1]; the official Firefox extension is currently blocked by dmca[2]. any donations might be helpful.
[0]: https://github.com/hoarder-app/hoarder/commit/b2c795ccb562c0...
[1]: https://www.reddit.com/r/selfhosted/s/CMCPP7cc8i
[2]: https://github.com/hoarder-app/hoarder/issues/899
nickthegreek 4 months ago
Set this up a couple weeks using an proxmox lxc script and have it using ollama to create tags. I hadn’t heard of singlefile before. That seems like an excellent pairing.
[-]
- radicality 4 months ago
  I recently got started with Proxmox too. Any thoughts/recommendations on running such tools in lxc, vs in a proxmox VM that has docker?
  [-]
  - regularjack 4 months ago
    In my experience, LXC uses much fewer resources than VMs so I typically prefer LXC over VM. But in all honesty, I just use whichever is available at https://community-scripts.github.io/ProxmoxVE/
  - nickthegreek 4 months ago
    Big thing know is that depending on your gpu, a vm will want to reserve it, making it unavailable to your lxc’s. But lxc’s can share a gpu. There might be some setup you can do with certain cards to create a vgpu to allow vim share, but that’s a headache I didn’t want to go down after getting my nvidia drivers setup on host and shared to lxc. Use the tools that regular jack posted and /r/selfhosted and r/proxmox are good resources. ChatGPT is pretty well versed on this stuff as well.
3eb7988a1663 4 months ago
Thoughts on this vs something like ArchiveBox?
[-]
- _ache_ 4 months ago
  No really the same goal. In Hoarder, the goal is to tag and make content easily searchable. The cached part is a plus, not the main goal.
  Actually, it's good but not an cached archive, its a just a cached zen mode version of the webpage (or full file if it is a PDF, EPUB, ...).
Tepix 4 months ago
Talking about hoarding, LTO tapes are the king of cheap storage, but if you want to archive significant amounts (hundreds of TB or more), it takes a significant investment to buy a tape library with somewhat recent drive. Too bad there aren't any alternatives - or are there?
[-]
- wkat4242 4 months ago
  Tapes are really crap for home use though. They're expensive, super noisy. You constantly have to change them during backing up.
  What I do now is use a whole box full of older harddrives that I replaced in my NAS. And I basically use them as tapes with a change frame.
  [-]
  - Tepix 4 months ago
    Yeah, that's why i wrote that you need a tape library so you change 8 tapes at a time. If you have LTO-7, writing 8*6TB = 48 TB before having to change tapes sounds pretty good.
    [-]
    - wkat4242 4 months ago
      Hm yeah but those tapes, they're not really a lot cheaper than a HDD of that capacity. And a tape library is a very expensive, huge and noisy.
      [-]
      - roygbiv2 4 months ago
        And as I found out the drives are tempermental. I had a tape library and eventually both drives said they'd needed cleaning, even after cleaning. When it worked it was great, though a cheap NAS with a couple of hard drives in it replaced it and was far more reliable and cheaper.
      - Tepix 4 months ago
        Here i'm seeing the cheapest HDD at a cost of 15€/TB and LTO-9 tape at 4.72€/TB. That's more than a 3x difference.
    - technopol 4 months ago
      How long do they last, and what will you do when they stop making tapes and equipment to read them?
      I ask because I came from a generation with a lot of tapes (reels, cassettes, 8-track, Betamax, VHS, etc.). Cassettes are coming back a little, but not much. I know long-term storage still uses tapes, but I wonder for how long. What happens when we run out of the resources to make them? Is there no better and safer long-term media that is affordable? A magnetic event could wipe them all.
      [-]
      - alpaca128 4 months ago
        Tapes are still being actively developed for archiving by companies like Fuji, Sony and IBM. They’re not going away any time soon.
        And if a magnetic event is strong enough to wipe all your tapes you probably have bigger problems on your hands than a fried backup.
      - Tepix 4 months ago
        I think you're good for 20 years or so if you store the tapes well. Pretty much all of the industry is using LTO tapes so i don't see them going away soon.
nirav72 4 months ago
Didn't realize Hoarder now supports SingleFile extension. amazing.
Regarding Hoarder - by selfhosting Hoarder , I was able to cancel my $40/year subscription to Pocket. With the money saved - I added $10 of OpenAI's API credits and use gpt-4o-mini for tagging. I don't have a powerful enough GPU to selfhost Ollama on my NAS where I'm hosting Hoarder. But gpt-4o-mini is dirt cheap for these type of use cases.
seltzered_ 4 months ago
Worth noting that Linkding (what the author migrated from to Hoarder) also now supports page archiving via headless Chrome + SingleFile and also via manual upload: https://linkding.link/archiving/
lurking_swe 4 months ago
Can Hoarder archive a webpage protected by some kind of auth / login?
[-]
- goatsi 4 months ago
  That's what single file is for. Hoarder fetches the webpage using it's own browser, single file makes a copy using your browser including any sessions, then sends that to hoarder.
  [-]
  - lurking_swe 4 months ago
    sounds promising! thanks, i’ll look into this.