#title Mirroring a site running AmuseWiki #pubdate 2018-05-06 #lang en #cat howto #teaser How to download a whole amusewiki site, create a mirror and keep it updated Starting with AmuseWiki version 2.031, released on October 14, 2017, each AmuseWiki site provides a =/mirror/= path offering a static version of the site, suitable for mirroring, backup and batch download. See e.g. [[https://amusewiki.org/mirror]] Starting with AmuseWiki version 2.2, released on March 20, 2018, the list of files to download is provided on two URLs: =/mirror.txt= (basic version) and =/mirror.ts.txt= (advanced) E.g. [[https://amusewiki.org/mirror.txt]] and [[https://amusewiki.org/mirror.ts.txt]] *** Download the whole site (the easy way) If you have a GNU/Linux box, =wget= is already installed and mirroring is as easy as running this command (using [[https://amusewiki.org]] as example): {{{ wget -q -O - https://amusewiki.org/mirror.txt | wget -x -N -q -i - }}} Explanation: The first =wget= call will download the list of file and pipe it (=-O -=) to the second call which is going to download the piped list (=-i -=), create the needed directories (=-x=) and check the timestamps (=-N=), so it will not download again the files if not modified. All this is happening quietly (=-q=). **** Windows If you don’t have =wget= installed or you can’t pipe commands, the procedure is a bit different. First you need to install =wget=. See [[https://www.gnu.org/software/wget/]], [[https://www.gnu.org/software/wget/faq.html#download]] and [[https://eternallybored.org/misc/wget/]] Please keep in mind that this is a command line utility, so you are going to need the Windows command prompt. Go to the directory where you want to create the mirror. Download [[https://amusewiki.org/mirror.txt]] and fetch that list: {{{ wget https://amusewiki.org/mirror.txt wget -x -N -i mirror.txt }}} And that’s it. *** Private sites Private sites are not exposing =/mirror/= for obvious reasons. However, they can be mirrored with =wget= providing the credentials to the HTTP authentication. {{{ wget -q -O - --user=user --password=password \ https://private.amusewiki.org/mirror.txt | \ wget --user=user --password=password -x -N -q -i - }}} *** Advanced **** Filtering Creative people can also additionally filter the file list to exclude formats they don’t want or get only a specific format, editing (locally or on the fly) the file list passed to =wget=. Example: download all the EPUB files and put them in the current directory (no directory tree): {{{ wget -q -O - https://amusewiki.org/mirror.txt | grep '\.epub$' | wget -N -i - }}} **** Building ZIM file Mirror can be converted to [[http://www.openzim.org][ZIM file format]] for [[http://www.openzim.org/wiki/Readers][offline reading]]. Download all files, excluding bare HTML format: {{{ wget -q -O - https://amusewiki.org/mirror.txt | \ grep -v '\.bare.html$' | \ wget -x -N -q -i - }}} Compile ZIM file using [[https://github.com/openzim/zimwriterfs][zimwriterfs]]: {{{ zimwriterfs -w index.html \ -f site_files/favicon.ico \ -l EN \ -t Amusewiki \ -d "Amusewiki" \ -c "Amusewiki" \ -p "Amusewiki" \ amusewiki.org/mirror/ amuse.zim }}} **** Be nice with the servers The above described techniques are good for a one-time job, they don’t create much traffic if there are no changes, but they still hammer the sites with a lot of requests. For this purpose, another file list is provided at =/mirror.ts.txt=, which include the timestamp of the files (without the full URL). The format is: one filename, hash symbol, timestamp. One file per line. E.g.: {{{ titles.html#1525363603 topics.html#1525363603 authors.html#1525363603 }}} This can be easily parsed and a client can check the local timestamp before doing the request. See [[https://github.com/melmothx/amusewiki/blob/master/script/mirror-site.pl]] for a simple (and usable) implementation.