The Web Archive (ArchiveBox): Difference between revisions
From Sea of Fate
Jump to navigationJump to search
Wikisailor (talk | contribs) Created page with "==📖 Introduction== ArchiveBox is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content. * The Outputs: For every URL you save, it creates a PDF, a Screenshot (PNG), a Single-File HTML, and a Wget clone. * The Goal: To build a searchable, permanent record of the specific web resources you use for research, separate from the broad scale o..." |
Wikisailor (talk | contribs) |
||
| Line 9: | Line 9: | ||
* Host: Blackberry (Proxmox VM) | * Host: Blackberry (Proxmox VM) | ||
* Compute: Uses the same 4 Cores / 6GB RAM as the rest of the stack. | * Compute: Uses the same 4 Cores / 6GB RAM as the rest of the stack. | ||
* Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox | * Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with '''[[The Kiwix Archive]]''' | ||
''Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).'' | ''Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).'' | ||
Revision as of 08:20, 9 February 2026
📖 Introduction
ArchiveBox is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content.
- The Outputs: For every URL you save, it creates a PDF, a Screenshot (PNG), a Single-File HTML, and a Wget clone.
- The Goal: To build a searchable, permanent record of the specific web resources you use for research, separate from the broad scale of OpenAlex.
- Synergy: Use OpenAlex to find a paper, use Kiwix for general encyclopedia background, and use ArchiveBox to save the specific blog posts or project wikis that support your work.
💾 The Infrastructure
ArchiveBox is heavy on disk I/O and storage, which is why it gets the dedicated 5TB drive.
- Host: Blackberry (Proxmox VM)
- Compute: Uses the same 4 Cores / 6GB RAM as the rest of the stack.
- Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with The Kiwix Archive
Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).