Jump to content

Dumps/Snapshot hosts

From Wikitech

Snapshot (XML dumps generation) cluster information

Hardware

These hosts generate the sql/XML dumps or the "misc cron" dumps. For information about the hosts that serve them, see Dumps/Dump servers.

In eqiad:

  • snapshot1010: primary and monitor, Dell PowerEdge R440, Debian 10 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1011: primary, Dell PowerEdge R440, Debian 11 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1012: primary, Dell PowerEdge R440, Debian 10 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1013: primary, Dell PowerEdge R440, Debian 10 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snpashot1014: spare, Dell PowerEdge R440, Debian 11 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1015: wikidata_fill_in, Dell PowerEdge R440, Debian 11 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6240 18-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1016: primary (other dumps), Dell PowerEdge R650xs, Debian 11 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6326 16-core CPUs, 2 480GB SSDs in software raid 1
  • snapshot1017: spare, Dell PowerEdge R650xs, Debian 11 (bullseye), 64GB RAM, 2 Intel Xeon Gold 6326 16-core CPUs, 2 480GB SSDs in software raid 1

Current setup

The dumps monitor which cleans up stale locks and generates the index.html file, currently runs on snapshot1010.

Worker nodes:

The fulldumps.sh script is supplied with a certain wikitype parameter, which can be any one of enwiki,wikidatawiki, or regular

  • snapshot1012 is configured with a wikitype of enwiki
  • snapshot1011 is configured with a wikitype of wikidatawiki
  • snapshot1010 and snapshot1013 are configured with a wikitype of regular

This wikitype does not mean that the host will be used exclusively for this type of dump, but it will prioritize that type.

For example when the scheduler on snapshot1012 has completed the enwiki dump, it will check to see if any of the dumps from the list of big wikis remains to be dumped. If there are no outstanding dumps, it will then check the remaining wikis and start on those. The wikidatawiki type behaves in the same way.

  • snapshot1015 is used to run a special fillin_wd script, with the aim or parallelizing the wikidata dumps even more. This waits until the 7th of the month, then it starts to work in parallel with snapshot1011 to dump the page content history files.

The testbed hosts are used for catchup, emergency testing of bug fixes, at scale tests of misc cron jobs, and so on.

For info on running the dumps, see Dumps#Starting_dump_runs.

Other tasks

All misc cron jobs apart from the sql/xml dump runs take place on snapshot1016.