The Web Archive (ArchiveBox): Difference between revisions
From Sea of Fate
Jump to navigationJump to search
Wikisailor (talk | contribs) |
Wikisailor (talk | contribs) |
||
| Line 11: | Line 11: | ||
* Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with '''[[The Kiwix Archive]]''' | * Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with '''[[The Kiwix Archive]]''' | ||
''Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).'' | ''Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).'' | ||
==π The Software Stack (Docker)== | |||
=== Installing Docker & Compose=== | |||
Before installing Dockge, we must install the Docker engine and the Compose plugin officially on Debian. | |||
# Update and install dependencies | |||
sudo apt update && sudo apt install -y ca-certificates curl gnupg | |||
# Add Dockerβs official GPG key | |||
sudo install -m 0755 -d /etc/apt/keyrings | |||
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc | |||
sudo chmod a+r /etc/apt/keyrings/docker.asc | |||
# Add the repository to Apt sources | |||
echo \ | |||
Β "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ | |||
Β $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ | |||
Β sudo tee /etc/apt/sources.list.d/docker.list > /dev/null | |||
# Install Docker Engine and Compose Plugin | |||
sudo apt update | |||
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin | |||
# Optional: Allow your user to run docker without sudo | |||
sudo usermod -aG docker $USER | |||
Revision as of 08:22, 9 February 2026
π Introduction
ArchiveBox is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content.
- The Outputs: For every URL you save, it creates a PDF, a Screenshot (PNG), a Single-File HTML, and a Wget clone.
- The Goal: To build a searchable, permanent record of the specific web resources you use for research, separate from the broad scale of OpenAlex.
- Synergy: Use OpenAlex to find a paper, use Kiwix for general encyclopedia background, and use ArchiveBox to save the specific blog posts or project wikis that support your work.
πΎ The Infrastructure
ArchiveBox is heavy on disk I/O and storage, which is why it gets the dedicated 5TB drive.
- Host: Blackberry (Proxmox VM)
- Compute: Uses the same 4 Cores / 6GB RAM as the rest of the stack.
- Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with The Kiwix Archive
Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).
π The Software Stack (Docker)
Installing Docker & Compose
Before installing Dockge, we must install the Docker engine and the Compose plugin officially on Debian.
# Update and install dependencies sudo apt update && sudo apt install -y ca-certificates curl gnupg # Add Dockerβs official GPG key sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker Engine and Compose Plugin sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Optional: Allow your user to run docker without sudo sudo usermod -aG docker $USER