FactGrid:Setup/1.39 upgrade

From FactGrid
Jump to navigation Jump to search

This page documents the steps that I (User:Lucas Werkmeister) followed to upgrade FactGrid from MediaWiki 1.35 to 1.39. Note that there were some missteps along the way – you shouldn’t treat this as instructions to follow step by step, at least not without reading the whole document first. I also can’t guarantee that this is complete.

Separate test installation

Before attempting to upgrade the main installation, I wanted to set up a separate 1.39 copy. I took an SQL dump without the CREATE DATABASE statement:

(umask go-rwx && mysqldump -uwikidata -pREDACTED --no-create-db factgridwikidata > mysqldump-2023-02-05.sql)

Then I set up a separate database, user and password in MySQL and imported that dump into the database. Next, I cloned MediaWiki 1.39 along with extensions and skins, somewhat like this:

git clone -b REL1_39 https://gerrit.wikimedia.org/r/mediawiki/core.git w-1.39-ephemeral
cp w/LocalSettings.php w-1.39-ephemeral/
cp -r w/images/* w-1.39-ephemeral/images/  # note: this should probably have been -a instead of -r, see the “Later fixes” section below
cp w/composer.local.json w-1.39-ephemeral/
cd w-1.39-ephemeral/
(cd skins/ && for skin in MonoBook Timeless Vector; do git clone -b REL1_39 --reference /var/www/w/skins/$skin/ --dissociate https://gerrit.wikimedia.org/r/mediawiki/skins/$skin.git; done)                                                                                                          
(cd extensions/ && for extension in CategoryTree Cite CiteThisPage CodeEditor ConfirmAccount ConfirmEdit Gadgets ImageMap InputBox Interwiki LocalisationUpdate MultimediaViewer Nuke OATHAuth OAuth PageImages ParserFunctions PdfHandler Poem Renameuser ReplaceText Scribunto SecureLinkFixer SpamBlacklist SyntaxHighlight_GeSHi TemplateData TextExtracts TitleBlacklist UniversalLanguageSelector VisualEditor WikiEditor Wikibase WikibaseLexeme cldr; do git clone -b REL1_39 --reference /var/www/w/extensions/$extension/ --dissociate https://gerrit.wikimedia.org/r/mediawiki/extensions/$extension.git; done)

I downloaded the latest Composer following its instructions and used it to install dependencies. I then edited the copied LocalSettings.php to adjust the $wgScriptPath and $wgArticlePath for /w-1.39-ephemeral, change the database credentials to point at the new database, update the way Wikibase is loaded, and comment out the wikibase-forms extension for now.

After double-checking in php maintenance/sql.php that this MediaWiki install was definitely targeting the copied database and not the live one (checking that fresh sandbox edits in real FactGrid didn’t show up in the other database), I ran the database upgrade:

time php maintenance/update.php --quick

This took just over hour, with most of the time spent finishing the actor table migration of the revision table.

I also had to initialize the Wikibase submodules:

cd /var/www/w-1.39-ephemeral/extensions/Wikibase
git submodule update --init --recursive

And with that, the 1.39 wiki was pretty much working. However, User:Olaf Simons pointed out that search wasn’t working very well: in Wikibase 1.39, term-based search is case sensitive, and to get better search you have to set up CirrusSearch / ElasticSearch.

CirrusSearch

Installing ElasticSearch was, fortunately, not as difficult as I had feared. While the latest version is no longer available under a free license, you can still download 7.10.2, including as a convenient .deb package with few dependencies. I downloaded and installed that package and added two plugins needed by CirrusSearch:

/usr/share/elasticsearch/bin/elasticsearch-plugin install org.wikimedia.search:extra:7.10.2-wmf4
/usr/share/elasticsearch/bin/elasticsearch-plugin install org.wikimedia.search.highlighter:experimental-highlighter-elasticsearch-plugin:7.10.2

I also added a mitigation for the Log4j vulnerabilities, based on the Wikibase release pipeline:

printf '%s\n' '-Dlog4j2.formatMsgNoLookups=true' > /etc/elasticsearch/jvm.options.d/T297674.options

I then started elasticsearch.service and confirmed with lsof -iTCP -sTCP:LISTEN -n -P that it only listens for connections on localhost.

Next, I cloned the Elastica, CirrusSearch, and WikibaseCirrusSearch extensions, and added this to the LocalSettings:

wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
wfLoadExtension( 'WikibaseCirrusSearch' );
$wgDisableSearchUpdate = true;
$wgWBCSUseCirrus = false;  # note: this was wrong, see below

I then followed the instructions in the CirrusSearch README, running:

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

Then commenting out $wgDisableSearchUpdate and making a test edit on the sandbox item. I also had to run php maintenance/runJobs.php manually, since the 1.39 wiki had no job queue runner; I later set one up. After that, the test edit was reflected in the &action=cirrusDump output, so I ran:

php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

This took 23 minutes (Indexed a total of 472627 pages at 340/second); then I ran:

php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse

This took 18 minutes (Indexed a total of 472627 pages at 431/second). I then enabled CirrusSearch in the LocalSettings like so:

$wgSearchType = 'CirrusSearch';
$wgWBCSUseCirrus = true;

However, this was actually premature; the two previous commands mainly seem to add a lot of jobs to the job queue, but those jobs need to run before the search is ready. So I set up a 1.39 job runner as a copy of the 1.35 one; due to out-of-memory errors, I reduced the number of jobs it should run before restarting (--maxjobs=50), which meant I also needed to tweak the systemd restart settings so it wouldn’t restart too often and run into the rate limit:

RestartSec=1s
StartLimitIntervalSec=10s
StartLimitBurst=20

The jobs took ca. 2½ hours to run to completion. However, at this point, wbsearchentities still didn’t work; after configuring a $wgDebugLogFile, I could see that this was due to the index being set up incorrectly (Field [sitelink_count] does not exist in mappings), since it had been initialized while $wgWBCSUseCirrus was false. Re-running UpdateSearchIndexConfig.php also told me that the config did not match, so as advertised by the output of that script, and in accordance with Upgrading (1.B) of the CirrusSearch README, I ran:

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now  # took 13 minutes
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php  # took 33 minutes

And with that, the wiki had working case-insensitive entity search again, now based on CirrusSearch.

Production upgrade

The actual upgrade of the wiki was scheduled for Saturday, 18 February 2023, with the read-only phase beginning 13:00 CET (12:00 UTC). Based on the update.php runtime earlier, I estimated the read-only phase to last for one hour; this turned out to be a bit of an underestimate because the SQL dump, which I wanted to do prior to update.php and also while read-only, took some time as well. (In theory, you could probably start the SQL dump and then immediately start update.php, relying on MySQL to keep the transactions separate, but I didn’t want to do that.)

Preparation

I decided to use the existing 1.39 source tree of the test wiki for the 1.39 version of real FactGrid too, so that I wouldn’t forget to initialize the Wikibase git submodules again or make other mistakes; I would just change its LocalSettings to point to the real database again. But first of all, I moved this source tree from its previous path, where it could be publicly tested (/w-1.39-ephemeral), to an unguessable path that would prevent anyone from reaching it prematurely: /w-1.39-unreachable-hash, where hash was generated via pwgen | md5sum. I then stopped mediawiki-jobqueue-1.39.service (which was now referencing a no-longer-existing path).

I now remembered the wikibase-forms extension, which I’d previously commented out to deal with later and then forgotten about. Fortunately, I had some time left and fixing it wasn’t too difficult: my fork of it now has a 1.39 branch. After testing the new version, I made the 1.39 wiki read-only in its LocalSettings:

$wgReadOnly = ( PHP_SAPI === 'cli' ) ? false : 'Das Wiki ist während dem Update auf MediaWiki 1.39 bis ca. 13:00 Uhr UTC (14:00 Uhr MEZ) nur lesbar, wir bitten die Unannehmlichkeiten zu entschuldigen. / The wiki is read-only during the MediaWiki 1.39 update until ca. 13:00 UTC, we apologize for the inconvenience.';

I wanted to delete the ElasticSearch data of the test wiki so they wouldn’t take up disk space forever, and I wanted to do so before starting to use ElasticSearch on the real wiki, so there was no risk of mixing up the test and production search data. So I added $wgDisableSearchUpdate = true; to the 1.39 LocalSettings, to stop new search updates flowing in, and then deleted the two test indices from ElasticSearch, based on StackOverflow:

curl -s -XGET http://localhost:9200/_cat/indices?v  # list indices
curl -X DELETE http://localhost:9200/factgridwikidata_test_1_39-factgrid_general_1676538230
curl -X DELETE http://localhost:9200/factgridwikidata_test_1_39-factgrid_content_1676537535

du -sh /var/lib/elasticsearch/ confirmed the reduced disk usage (from 1 GB to a few hundred kB, as far as I recall). I then prepared the 1.39 LocalSettings so that I would be able to bootstrap CirrusSearch without the previous index issues (i.e. $wgWBCSUseCirrus needed to be true for the initial maintenance script already):

wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
wfLoadExtension( 'WikibaseCirrusSearch' );
$wgDisableSearchUpdate = true;
if ( PHP_SAPI === 'cli' ) {
	$wgSearchType = 'CirrusSearch';
	$wgWBCSUseCirrus = true;
}
# note: I missed something here, see below

Finally, at some point I had changed the 1.39 LocalSettings to reference the real database settings again, and also smoothed out other differences between the 1.35 and 1.39 LocalSettings that weren’t necessary (e.g. removing the $wgDebugLogFile again); I don’t remember when exactly I did this, and my notes don’t say, I’m afraid.

Actual upgrade to MediaWiki 1.39

At 13:00 CET, I added the same $wgReadOnly code to the 1.35 LocalSettings as well, stopped mediawiki-jobqueue.service, then took an SQL dump:

(umask go-rwx && mysqldump -uwikidata -pREDACTED factgridwikidata > mysqldump-2023-02-18.sql)

I believe this took about 12 minutes. Then I ran php maintenance/update.php --quick, which took about an hour as expected. I moved the symlinks in /var/www around so that /w would point at 1.39 and 1.35 would be unreachable:

mv w-1.39-unreachable-REDACTED w-1.39 && ln -sfT w-1.39 w && mv w-1.35 w-1.35-unreachable-REDACTED

Then I tried to load the wiki; it didn’t really work, so I restarted apache2.service, which seemed to fix those issues. (I don’t know why the restart was necessary, though.) After testing a bit more, I commented out $wgReadOnly, making the 1.39 wiki read-write again; then I updated mediawiki-jobqueue.service with the improvements from the 1.39 version (more frequent restarts) and started it again.

CirrusSearch

Next, I went to set up CirrusSearch for the production 1.39 wiki. I ran

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

which took 1-2 minutes, then commented out $wgDisableSearchUpdate and made a test edit on the sandbox item. I was hoping that, after the job queue finished running, its ?action=cirrusDump would show search data, but it remained empty ([]); I don’t know why. Eventually I decided to go ahead with the CirrusSearch work anyways:

php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip  # took 19 minutes
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse  # took 20 minutes

The sandbox item cirrusDump now showed proper search data, so it worked out in the end. I then changed the LocalSettings to:

$wgSearchType = 'CirrusSearch';
$wgWBCSUseCirrus = true;

In hindsight, this was premature – I should have waited until the job queue was finished running, which took about 2½ hours more. (I did indeed receive a complaint about missing search results during that time.)

Later, I was asked why haswbstatement searches weren’t working yet. It turns out that no statements are indexed by default – you have to set $wgWBRepoSettings['searchIndexTypes'] to a list of data types you want to have indexed. I copied the list from the production config:

// same searchIndexTypes as in Wikimedia production (as of operations/mediawiki-config.git, commit f16227431c)
// we do not set $wgWBRepoSettings['searchIndexPropertiesExclude'], so all properties of these types are indexed, without exceptions
$wgWBRepoSettings['searchIndexTypes'] = [
        'string',
        'external-id',
        'url',
        'wikibase-item',
        'wikibase-property',
        'wikibase-lexeme',
        'wikibase-form',
        'wikibase-sense',
];

Then I rebuilt the search index following the (1.B) instructions from the CirrusSearch README:

php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php

And after the job queue finished running once more, haswbstatement searches were working.

Later fixes

Because the uploads directory had been copied without preserving ownership, it was owned by factgrid instead of www-data; fixed on 2023-05-14 with:

chown -R www-data:www-data /var/www/w/images/