FactGrid:Setup
This page describes the technical setup of the FactGrid website and services. FactGrid currently runs on a single virtual server, and all the file system paths mentioned here refer to that server.
See also /1.39 upgrade for a description of the process that was used to upgrade FactGrid from MediaWiki 1.35 to 1.39.
Database Details
- CPU: laut /proc/cpuinfo 4× Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
- RAM: 7.7 GiB bzw. 8.1 GB laut free, 8068724 kB laut /proc/meminfo (zzgl. 7.9 GiB bzw. 8.3 GB swap)
- free-Schnappschuss (niedrige Last): 3.3 GiB used, 4.3 buff/cache
- HD: 976 GiB bzw. 1.1 TB laut df, ext4, über LVM (aber soweit ich sehe nur auf einer Festplatte, die wiederum ist aber laut lsblk virtuell (s.u.); davon verwendet: 133 GiB bzw. 143 GB, also 15% Festplattenauslastung
- VM: vmware laut systemd-detect-virt
- OS: Debian GNU/Linux 9 (stretch) laut /etc/os-release; allerdings php7.4 statt php7.0 (von packages.sury.org/php)
Das ist das System, auf sowohl das Wiki (Webserver, PHP) als auch der Query Service (Blazegraph plus Updater) laufen (d.h. ist bis jetzt nicht über mehrere Systeme verteilt worden). Details zum Setup im Folgenden:
Packages
Additional packages installed include:
- php-dom for MediaWiki
- php-mbstring for MediaWiki
- php-xml for MediaWiki
- php-gmp for MediaWiki (suggested by wikimedia/avro, not sure if needed but can’t hurt)
- php-intl for Unicode support in QuickStatements
- php-curl for Elastica / CirrusSearch
- for building a local Python (for OpenRefine-Wikibase reconciliation service) (with the upgrade to Debian Bullseye, this is probably no longer needed):
- build-essential
- libssl1.0-dev
- libreadline-dev
- zlib1g-dev
- libffi-dev
- redis-server for OpenRefine-Wikibase reconciliation service
This list is probably incomplete. I hope to add to it in the future if any further packages are installed, but many existing installed packages are not recorded here.
MediaWiki
MediaWiki is installed as a Git clone of the REL1_39 branch under /var/www/w-1.39/
, symlinked into /var/www/w/
.
Apache serves /var/www/
as document root,
with the standard MediaWiki short URL setup to rewrite /wiki/
into /w/index.php
.
MediaWiki extensions and skins are checked out as Git repositories
(some of them are registered as submodules in the REL1_39 branch),
but vendor/
is installed via Composer,
instead of using mediawiki-vendor.
(A composer.local.json
file instructs Composer to include dependencies of extensions and skins.)
Image uploads are enabled (images
is owned by www-data:www-data
).
The job queue is processed by the mediawiki-jobqueue.service
unit,
which is configured to frequently restart itself,
to avoid having outdated PHP code run for too long as well as out-of-memory errors.
A daily mediawiki-jobqueue-restart.timer
additionally restarts the job queue service,
to avoid situations where the job queue fails to start due to database errors and systemd gives up on restarting it forever.
QuickStatements
The git repositories for quickstatements and its dependency magnustools are cloned under /srv/
,
and symlinks in /var/www/
point into their public_html/
subdirectories.
(The clones were originally named /srv/quickstatements
and /srv/magnustools
,
but newer versions, cloned under /srv/quickstatements_2023
and /srv/magnustools_2023
, are used since 26 February 2023.)
There is an oauth.ini
configuration file in /srv/quickstatements_2023/
(for this consumer,
with a request modeled after the original Wikidata consumer),
and a config.json
file in /src/quickstatements_2023/public_html/
describes the URL layout of the FactGrid site
and selects FactGrid as the site to use.
Logs go to /srv/quickstatements_2023/tool.log
,
which is owned by the www-data
group and group-writable.
Batches which the user requests to run in the background,
instead of directly in the browser,
are saved to the quickstatements_2023
database,
to which the quickstatements_2023
SQL user has access;
both the openDbTool()
calls and setAuthDbName()
method in QuickStatements and the openDbTool()
function in Magnustools
have been patched to access this database instead of the normal (very Toolforge-specific) database access code,
using the password residing in the /srv/quickstatements_2023/db-password
file,
which is owned by the www-data
group and group- but not world-readable.
QuickStatements has also been patched to format batch links in its edit summaries
using the quickstatements:
link prefix,
instead of the usual toollabs:quickstatements/
;
the quickstatements:
interwiki prefix was installed with the following command
(via the maintenance/sql.php
script):
INSERT INTO factgridinterwiki (iw_prefix, iw_url, iw_local, iw_trans) VALUES ('quickstatements', '/quickstatements/$1', 1, 0);
The bot which actually processes the batches runs as quickstatements-bot.service
,
loading batches from the database and sending the appropriate edit requests to the API.
(When it has nothing to do, it sleeps in one-second intervals.)
Make sure to run systemctl restart quickstatements-bot
whenever code changes to QuickStatements are made,
otherwise the bot will not pick them up.
Reasonator
The git repository for reasonator is cloned under /srv
,
and a symlink in /var/www/
points into its public_html/v2/
subdirectory.
config.json
is copied from config.json.template
with some property IDs replaced with their FactGrid equivalent,
a few replaced with “TODO”,
and most other property IDs completely removed because they don’t apply to FactGrid.
There are also minor uncommitted changes in vue.js
(avoid CORS errors) and main-page.html
(replace example items),
though hopefully those should become unnecessary in the future.
Query service
Upstream instructions:
The query service source is cloned in ~factgrid/wikidata-query-rdf/
,
built using ant as described in the “getting started” document,
and unzipped into /srv/wdqs-0.3.97-SNAPSHOT/
(to which /srv/wdqs/
is a symlink).
RWStore.properties
is edited to adjust the location of the journal file,
which we have in /var/lib/wdqs/factgrid.jnl
;
mwservices.conf
is edited to add database.factgrid.de
to the allowed MWAPI endpoints;
whitelist.txt
is added to allow SPARQL federation with the following endpoints:
- WDQS (
SERVICE <https://query.wikidata.org/sparql> { ... }
) - DBpedia (
SERVICE <https://dbpedia.org/sparql> { ... }
)
The query service itself runs as the blazegraph.service
systemd unit
(run systemctl cat blazegraph
to see the configuration files).
Its standard output and error go to the journal,
and can be viewed by administrators with journalctl -u blazegraph
(add -e
for the latest messages).
Apache2 is configured (/etc/apache2/sites-available/001-factgrid-ssl.conf
)
to forward requests to /sparql
to Blazegraph.
It adds Blazegraph-specific request headers to enforce a max query time (60 seconds) and read-only mode,
and an Access-Control-Allow-Origin
response header to allow client-side JavaScript code to read query responses without restrictions.
The updater for the query service,
which reads updates from the wiki’s recent changes and applies them to the query service,
similarly runs as blazegraph-update.service
.
The query service UI is cloned in ~factgrid/wikidata-query-gui/
.
It can be built using npm run build
,
and the resulting build/
directory is then copied into /var/www/
,
with a symlink /var/www/query
pointing to the latest version.
A few of the files in the repository have uncommitted changes specific to FactGrid;
before updating the GUI, they have to be stashed away.
git stash save && git pull && git stash pop && npm install && npm run build && cp -a custom-config.json factgrid.png build/ && now=$(date -Iseconds) && cp -a build/ /var/www/query-"$now" && ln -sfT query-"$now" /var/www/query # atomically update symlink # optional: remove the old /var/www/query-* directory
Dumps
dump-json.service
creates a gzip-compressed JSON dump in /srv/dumps/
, named after the current date (ISO 8601 format).
dump-json.timer
runs that service each day at 21:00 (CET).
/srv/dumps/
is symlinked into /var/www/
(i.e. https://database.factgrid.de/dumps/);
systemd-tmpfiles-clean.service
, configured via /etc/tmpfiles.d/dumps.conf
, removes dumps after 90 days.
Reconciliation service
An instance of the openrefine-wikibase service is installed in /home/factgrid/openrefine-wikibase/
,
with dependencies in a venv under .venv/
and configuration in config.py
.
(Prior to the upgrade to Debian 11 / Bullseye, it used a locally built Python 3.9.9 with sources in /home/factgrid/Python-3.9.9/
, installed using make altinstall
under prefix /usr/local/
;
this old Python is mostly still around, because Python doesn’t provide a make uninstall
command, but it’s no longer used, and I manually renamed the /usr/local/bin
files to avoid confusion.)
openrefine-wikibase.service
runs the service on localhost, port 8000;
Apache is configured to proxy https://database.factgrid.de/reconcile/ to this service,
which means the actual reconciliation service URL to configure in OpenRefine is https://database.factgrid.de/reconcile/en/api,
or https://database.factgrid.de/reconcile/de/api for German labels/descriptions.
A Wikibase manifest for OpenRefine is available at https://database.factgrid.de/factgrid-manifest.json.
ElasticSearch
ElasticSearch is installed via the 7.10.2 .deb package,
with the org.wikimedia.search:extra:7.10.2-wmf4
and org.wikimedia.search.highlighter:experimental-highlighter-elasticsearch-plugin:7.10.2
plugins installed via /usr/share/elasticsearch/bin/elasticsearch-plugin install name:version
.
CirrusSearch and WikibaseCirrusSearch are installed, mainly according to the CirrusSearch README;
note that $wgWBCSUseCirrus
must already be true
when the search index is initialized.
$wgWBRepoSettings['searchIndexTypes']
lists the same property data types to index for haswbstatement
search as in production:
string
, external-id
, url
, wikibase-item
, wikibase-property
, wikibase-lexeme
, wikibase-form
, wikibase-sense
.