Data Archive: Difference between revisions

From Sea of Fate
Jump to navigationJump to search
Line 13: Line 13:


'''[[The Kiwix Archive]]''' is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable.
'''[[The Kiwix Archive]]''' is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable.
==🕸️[[The Web Archive (ArchiveBox)]]==
'''[[The Web Archive (ArchiveBox)]]''' is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content. It is also hosted as a docker application on Blackberry

Revision as of 08:13, 9 February 2026

Introduction

We are building an offline vault of data for use over the next few years. To accomplish that We need to have one or more Virtual Machines that will be able to download and store the data. As most of the load will be on CPU storage and Internet bandwidth for the archival we will keep this part separate from the AI and Jellyfin host Quince that is powered by a GPU Passthrough. for the more general notes on Linux Docker And GPU Passthrough check Here . Blackberry VM is hosting docker and some of the less intensive but equally useful tools.

Reasons for a Data Archive

We want to have a completely offline copy of as much of the data currently on the WWW. The reason for doing so now as opposed to in the past is that the new wave of LLMs and AI are creating their own summarised and sanitised versions of the wealth of data that has accumulated in the last 30 or so years.

OpenAlex

OpenAlex is a massive, open-source catalog of the global research system, containing over 250 million "Works" (papers, datasets, etc.). With a web dashboard to construct queries to search for papers it will complement the of research tool.

The Kiwix Archive

The Kiwix Archive is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable.

🕸️The Web Archive (ArchiveBox)

The Web Archive (ArchiveBox) is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content. It is also hosted as a docker application on Blackberry