The Web Archive (ArchiveBox): Difference between revisions

From Sea of Fate
Jump to navigationJump to search
Line 33: Line 33:
Β  # Optional: Allow your user to run docker without sudo
Β  # Optional: Allow your user to run docker without sudo
Β  sudo usermod -aG docker $USER
Β  sudo usermod -aG docker $USER
===πŸ› οΈΒ  Installing Dockge===
Dockge allows us to manage our "Stacks" (Docker Compose files) through a clean web interface.
# Preparation: Create directories
mkdir -p /opt/stacks /opt/dockge
cd /opt/dockge
# Download and Start Dockge
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
docker compose up -d

Revision as of 08:23, 9 February 2026

πŸ“– Introduction

ArchiveBox is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content.

  • The Outputs: For every URL you save, it creates a PDF, a Screenshot (PNG), a Single-File HTML, and a Wget clone.
  • The Goal: To build a searchable, permanent record of the specific web resources you use for research, separate from the broad scale of OpenAlex.
  • Synergy: Use OpenAlex to find a paper, use Kiwix for general encyclopedia background, and use ArchiveBox to save the specific blog posts or project wikis that support your work.

πŸ’Ύ The Infrastructure

ArchiveBox is heavy on disk I/O and storage, which is why it gets the dedicated 5TB drive.

  • Host: Blackberry (Proxmox VM)
  • Compute: Uses the same 4 Cores / 6GB RAM as the rest of the stack.
  • Storage: 5TB XFS disk mounted at /mnt/archive_data/archivebox and and an additional 4TB XFS disk mouted at /mnt/docker_data for use with The Kiwix Archive

Note: ArchiveBox can grow very fast (approx. 1GB per 1000 articles).

πŸ‹ The Software Stack (Docker)

Installing Docker & Compose

Before installing Dockge, we must install the Docker engine and the Compose plugin officially on Debian.

# Update and install dependencies
sudo apt update && sudo apt install -y ca-certificates curl gnupg
# Add Docker’s official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine and Compose Plugin
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Optional: Allow your user to run docker without sudo
sudo usermod -aG docker $USER

πŸ› οΈ Installing Dockge

Dockge allows us to manage our "Stacks" (Docker Compose files) through a clean web interface.

# Preparation: Create directories
mkdir -p /opt/stacks /opt/dockge
cd /opt/dockge
# Download and Start Dockge
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
docker compose up -d