Data Archive: Difference between revisions
Wikisailor (talk | contribs) |
Wikisailor (talk | contribs) |
||
| Line 10: | Line 10: | ||
'''[[OpenAlex]]''' is a massive, open-source catalog of the global research system, containing over 250 million "Works" (papers, datasets, etc.). With a web dashboard to construct queries to search for papers it will complement the of research tool. | '''[[OpenAlex]]''' is a massive, open-source catalog of the global research system, containing over 250 million "Works" (papers, datasets, etc.). With a web dashboard to construct queries to search for papers it will complement the of research tool. | ||
==[[The Kiwix Archive]]== | ==📚[[The Kiwix Archive]]== | ||
'''[[The Kiwix Archive]]''' is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable. | '''[[The Kiwix Archive]]''' is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable. | ||
Latest revision as of 08:14, 9 February 2026
Introduction
We are building an offline vault of data for use over the next few years. To accomplish that We need to have one or more Virtual Machines that will be able to download and store the data. As most of the load will be on CPU storage and Internet bandwidth for the archival we will keep this part separate from the AI and Jellyfin host Quince that is powered by a GPU Passthrough. for the more general notes on Linux Docker And GPU Passthrough check Here . Blackberry VM is hosting docker and some of the less intensive but equally useful tools.
Reasons for a Data Archive
We want to have a completely offline copy of as much of the data currently on the WWW. The reason for doing so now as opposed to in the past is that the new wave of LLMs and AI are creating their own summarised and sanitised versions of the wealth of data that has accumulated in the last 30 or so years.
OpenAlex
OpenAlex is a massive, open-source catalog of the global research system, containing over 250 million "Works" (papers, datasets, etc.). With a web dashboard to construct queries to search for papers it will complement the of research tool.
📚The Kiwix Archive
The Kiwix Archive is hosted on a docker container on Blackberry:8081 and is used to access the encyclopaedia made up of Wikipedia and a wide variety of related information sources. Most of the data is from either early 2026 or mid 2025 and before the internet becomes too flattened and AI washed to be unrecognisable.
🕸️The Web Archive (ArchiveBox)
The Web Archive (ArchiveBox) is a self-hosted web archiving solution. Unlike a simple bookmark, it takes a "snapshot" of a page in multiple formats so that if the original site goes down, you still have the full content. It is also hosted as a docker application on Blackberry