The Kiwix Archive

From Sea of Fate
Revision as of 20:29, 20 April 2026 by Wikisailor (talk | contribs) (โ†’๐Ÿ“– Introduction)
(diff) โ† Older revision | Latest revision (diff) | Newer revision โ†’ (diff)
Jump to navigationJump to search

๐Ÿ“– Introduction

Kiwix is an offline content reader that allows you to browse massive websitesโ€”like Wikipedia, StackExchange, or Project Gutenbergโ€”without an internet connection.

  • The Format: It uses highly compressed .ZIM files. A single file can contain the entirety of Wikipedia (with images) or the complete medical encyclopedia.
  • The Goal: To provide a permanent, offline knowledge base that remains accessible even if the internet is down, serving everyone on your local network.
  • Synergy: Works alongside OpenAlex (scholarly search) and ArchiveBox (personal web snapshots) to create a three-tier local research library.

Once this collection of Zims are downloaded we can also extract the text from the files to create a collection of .MD files which can then be used as a Retrieval-Augmented Generation database or RAG database for offline LLMs or the same data could be used as fine tuning data for o0ne or more LLMs. Although some of the .MDs will be general knowledge and only really useful for the RAG DB some will detailed question and answer tables of condensed information for example the Slack Overflow will be high density data sets

๐Ÿ’พ The Infrastructure

Blackberry has been slimmed down to be more efficient now that indexing is handled elsewhere.

  • Host: Blackberry
  • VM Config: Debian | 4 Cores | 6GB RAM.
  • Storage: 4TB XFS disk mounted at /mnt/docker_data and an additional 5TB XFS disk for ArchiveBox /mnt/archive_data

Linux Commands

A set of Linux commands to help show the progress of indexing and file copying

๐Ÿ‹ The Software Stack (Docker)

Installing Docker & Compose

Before installing Dockge, we must install the Docker engine and the Compose plugin officially on Debian.

# Update and install dependencies
sudo apt update && sudo apt install -y ca-certificates curl gnupg
# Add Dockerโ€™s official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine and Compose Plugin
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Optional: Allow your user to run docker without sudo
sudo usermod -aG docker $USER

๐Ÿ› ๏ธ Installing Dockge

Dockge allows us to manage our "Stacks" (Docker Compose files) through a clean web interface.

# Preparation: Create directories
mkdir -p /opt/stacks /opt/dockge
cd /opt/dockge
# Download and Start Dockge
curl https://raw.githubusercontent.com/louislam/dockge/master/compose.yaml --output compose.yaml
docker compose up -d

๐Ÿ› ๏ธ Preparation: Storage Folders

Organize The ZIM files on the 5TB disk so the container can find them easily.

mkdir -p /mnt/docker_data/stacks/kiwix-archive/zim/

๐Ÿ“„ Kiwix YAML (The Stack)

Deploy this in your Dockge instance on Blackberry (Port 5001) and name it kiwix

services:
  kiwix:
    image: ghcr.io/kiwix/kiwix-serve:latest
    container_name: kiwix_wikipedia
    volumes:
      - /mnt/docker_data/stacks/kiwix-archive/zim:/data
    ports:
      - 8081:8080
    command:
      - --library
      - library.xml
    restart: unless-stopped
networks: {}

๐ŸŒ Accessing and Using the Library

Tool URL Purpose
Kiwix Web UI http://blackberry:8080 Browsing your downloaded offline libraries.
ZIM Library https://kiwix.org/en/download Where to download new content (Wikipedia, StackOverflow, etc.).
German Zim Library https://ftp.fau.de/kiwix/zim/ lists of zim files.
Kewix org https://download.kiwix.org/zim/ More Lists of Zim files

Indexing Files after download

There are three helper scripts in the /mnt/docker_data/stacks/kiwix-archive/zim/ directory that will help add new zim files to the xml index

Filename Purpose
./audit_incomplete.sh scans zim dir for incomplete files and move them to incomplete directory
./sweep_parts.sh Checks any files in the incomplete directory for files that have corresponding completed downloaded Zim
./index_vault.sh Adds new Zim to XML index file