\documentclass[DIV=13,%
BCOR=0mm,%
headinclude=false,%
footinclude=false,open=any,%
fontsize=10pt,%
oneside,%
paper=a5]%
{scrbook}
\usepackage[noautomatic]{imakeidx}
\usepackage{microtype}
\usepackage{graphicx}
\usepackage{alltt}
\usepackage{verbatim}
\usepackage[shortlabels]{enumitem}
\usepackage{tabularx}
\usepackage[normalem]{ulem}
\def\hsout{\bgroup \ULdepth=-.55ex \ULset}
% https://tex.stackexchange.com/questions/22410/strikethrough-in-section-title
% Unclear if \protect \hsout is needed. Doesn't looks so
\DeclareRobustCommand{\sout}[1]{\texorpdfstring{\hsout{#1}}{#1}}
\usepackage{wrapfig}
% avoid breakage on multiple
and avoid the next [] to be eaten
\newcommand*{\forcelinebreak}{\strut\\*{}}
\newcommand*{\hairline}{%
\bigskip%
\noindent \hrulefill%
\bigskip%
}
% reverse indentation for biblio and play
\newenvironment*{amusebiblio}{
\leftskip=\parindent
\parindent=-\parindent
\smallskip
\indent
}{\smallskip}
\newenvironment*{amuseplay}{
\leftskip=\parindent
\parindent=-\parindent
\smallskip
\indent
}{\smallskip}
\newcommand*{\Slash}{\slash\hspace{0pt}}
% http://tex.stackexchange.com/questions/3033/forcing-linebreaks-in-url
\PassOptionsToPackage{hyphens}{url}\usepackage[hyperfootnotes=false,hidelinks,breaklinks=true]{hyperref}
\usepackage{bookmark}
\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage{english}
\setmainfont{texgyrepagella-regular.otf}[Script=Latin,%
Ligatures=TeX,%
Path=/usr/share/texmf/fonts/opentype/public/tex-gyre/,%
BoldFont=texgyrepagella-bold.otf,%
BoldItalicFont=texgyrepagella-bolditalic.otf,%
ItalicFont=texgyrepagella-italic.otf]
\setmonofont{cmuntt.ttf}[Script=Latin,%
Ligatures=TeX,%
Scale=MatchLowercase,%
Path=/usr/share/fonts/truetype/cmu/,%
BoldFont=cmuntb.ttf,%
BoldItalicFont=cmuntx.ttf,%
ItalicFont=cmunit.ttf]
\setsansfont{cmunss.ttf}[Script=Latin,%
Ligatures=TeX,%
Scale=MatchLowercase,%
Path=/usr/share/fonts/truetype/cmu/,%
BoldFont=cmunsx.ttf,%
BoldItalicFont=cmunso.ttf,%
ItalicFont=cmunsi.ttf]
\newfontfamily\englishfont{texgyrepagella-regular.otf}[Script=Latin,%
Ligatures=TeX,%
Path=/usr/share/texmf/fonts/opentype/public/tex-gyre/,%
BoldFont=texgyrepagella-bold.otf,%
BoldItalicFont=texgyrepagella-bolditalic.otf,%
ItalicFont=texgyrepagella-italic.otf]
\renewcommand*{\partpagestyle}{empty}
% global style
\pagestyle{plain}
\usepackage{indentfirst}
% remove the numbering
\setcounter{secnumdepth}{-2}
% remove labels from the captions
\renewcommand*{\captionformat}{}
\renewcommand*{\figureformat}{}
\renewcommand*{\tableformat}{}
\KOMAoption{captions}{belowfigure,nooneline}
\addtokomafont{caption}{\centering}
\deffootnote[3em]{0em}{4em}{\textsuperscript{\thefootnotemark}~}
\addtokomafont{disposition}{\rmfamily}
\addtokomafont{descriptionlabel}{\rmfamily}
\frenchspacing
% avoid vertical glue
\raggedbottom
% this will generate overfull boxes, so we need to set a tolerance
% \pretolerance=1000
% pretolerance is what is accepted for a paragraph without
% hyphenation, so it makes sense to be strict here and let the user
% accept tweak the tolerance instead.
\tolerance=200
% Additional tolerance for bad paragraphs only
\setlength{\emergencystretch}{30pt}
% (try to) forbid widows/orphans
\clubpenalty=10000
\widowpenalty=10000
% given that we said footinclude=false, this should be safe
\setlength{\footskip}{2\baselineskip}
\title{Mirroring a site running AmuseWiki}
\date{}
\author{}
\subtitle{}
% https://groups.google.com/d/topic/comp.text.tex/6fYmcVMbSbQ/discussion
\hypersetup{%
pdfencoding=auto,
pdftitle={Mirroring a site running AmuseWiki},%
pdfauthor={},%
pdfsubject={},%
pdfkeywords={howto}%
}
\begin{document}
\begin{titlepage}
\strut\vskip 2em
\begin{center}
{\usekomafont{title}{\huge Mirroring a site running AmuseWiki\par}}%
\vskip 1em
\vskip 2em
\vskip 1.5em
\vfill
\strut\par
\end{center}
\end{titlepage}
\cleardoublepage
\tableofcontents
% start a new right-handed page
\cleardoublepage
Starting with AmuseWiki version 2.031, released on October 14, 2017,
each AmuseWiki site provides a \texttt{/mirror/} path offering a static
version of the site, suitable for mirroring, backup and batch
download. See e.g. \href{https://amusewiki.org/mirror}{\texttt{https://amusewiki.org/mirror}}
Starting with AmuseWiki version 2.2, released on March 20, 2018, the
list of files to download is provided on two URLs: \texttt{/mirror.txt}
(basic version) and \texttt{/mirror.ts.txt} (advanced)
E.g. \href{https://amusewiki.org/mirror.txt}{\texttt{https://amusewiki.org/mirror.txt}} and \href{https://amusewiki.org/mirror.ts.txt}{\texttt{https://amusewiki.org/mirror.ts.txt}}
\section{Download the whole site (the easy way)}
If you have a GNU\Slash{}Linux box, \texttt{wget} is already installed and mirroring
is as easy as running this command (using \href{https://amusewiki.org}{\texttt{https://amusewiki.org}} as
example):
\begin{alltt}
wget -q -O - https://amusewiki.org/mirror.txt \textbar{} wget -x -N -q -i -
\end{alltt}
Explanation:
The first \texttt{wget} call will download the list of file and pipe it (\texttt{-O -}) to the second call
which is going to download the piped list (\texttt{-i -}), create the needed directories (\texttt{-x}) and
check the timestamps (\texttt{-N}), so it will not download again the files if not modified.
All this is happening quietly (\texttt{-q}).
\subsection{Windows}
If you don’t have \texttt{wget} installed or you can’t pipe commands, the
procedure is a bit different.
First you need to install \texttt{wget}. See
\href{https://www.gnu.org/software/wget/}{\texttt{https://www.gnu.org/software/wget/}},
\href{https://www.gnu.org/software/wget/faq.html\#download}{\texttt{https://www.gnu.org/software/wget/faq.html\#download}} and
\href{https://eternallybored.org/misc/wget/}{\texttt{https://eternallybored.org/misc/wget/}}
Please keep in mind that this is a command line utility, so you are
going to need the Windows command prompt.
Go to the directory where you want to create the mirror. Download \href{https://amusewiki.org/mirror.txt}{\texttt{https://amusewiki.org/mirror.txt}} and fetch that list:
\begin{alltt}
wget https://amusewiki.org/mirror.txt
wget -x -N -i mirror.txt
\end{alltt}
And that’s it.
\section{Private sites}
Private sites are not exposing \texttt{/mirror/} for obvious reasons.
However, they can be mirrored with \texttt{wget} providing the credentials to
the HTTP authentication.
\begin{alltt}
wget -q -O - --user=user --password=password \textbackslash{}
https://private.amusewiki.org/mirror.txt \textbar{} \textbackslash{}
wget --user=user --password=password -x -N -q -i -
\end{alltt}
\section{Advanced}
\subsection{Filtering}
Creative people can also additionally filter the file list to exclude
formats they don’t want or get only a specific format, editing
(locally or on the fly) the file list passed to \texttt{wget}.
Example: download all the EPUB files and put them in the current
directory (no directory tree):
\begin{alltt}
wget -q -O - https://amusewiki.org/mirror.txt \textbar{} grep '\textbackslash{}.epub\$' \textbar{} wget -N -i -
\end{alltt}
\subsection{Building ZIM file}
Mirror can be converted to \href{http://www.openzim.org}{ZIM file format} for \href{http://www.openzim.org/wiki/Readers}{offline reading}.
Download all files, excluding bare HTML format:
\begin{alltt}
wget -q -O - https://amusewiki.org/mirror.txt \textbar{} \textbackslash{}
grep -v '\textbackslash{}.bare.html\$' \textbar{} \textbackslash{}
wget -x -N -q -i -
\end{alltt}
Compile ZIM file using \href{https://github.com/openzim/zimwriterfs}{zimwriterfs}:
\begin{alltt}
zimwriterfs -w index.html \textbackslash{}
-f site\_files/favicon.ico \textbackslash{}
-l EN \textbackslash{}
-t Amusewiki \textbackslash{}
-d "Amusewiki" \textbackslash{}
-c "Amusewiki" \textbackslash{}
-p "Amusewiki" \textbackslash{}
amusewiki.org/mirror/ amuse.zim
\end{alltt}
\subsection{Be nice with the servers}
The above described techniques are good for a one-time job, they don’t
create much traffic if there are no changes, but they still hammer the
sites with a lot of requests.
For this purpose, another file list is provided at \texttt{/mirror.ts.txt},
which include the timestamp of the files (without the full URL). The format is:
one filename, hash symbol, timestamp. One file per line. E.g.:
\begin{alltt}
titles.html\#1525363603
topics.html\#1525363603
authors.html\#1525363603
\end{alltt}
This can be easily parsed and a client can check the local timestamp
before doing the request.
See
\href{https://github.com/melmothx/amusewiki/blob/master/script/mirror-site.pl}{\texttt{https://github.com/melmothx/amusewiki/blob/master/script/mirror-site.pl}}
for a simple (and usable) implementation.
% begin final page
\clearpage
% new page for the colophon
\thispagestyle{empty}
\begin{center}
\bigskip
\includegraphics[width=0.25\textwidth]{logo-amw.pdf}
\bigskip
\end{center}
\strut
\vfill
\begin{center}
Mirroring a site running AmuseWiki
\bigskip
\bigskip
\textbf{amusewiki.org}
\end{center}
% end final page with colophon
\end{document}
% No format ID passed.