Another progress update

Blog posts haven’t been very frequent recently because I’ve been busy working on the wiki. This post is a catch up of what I’ve been doing in the last two weeks and what I’ve learnt from it.

Although I said before that the data structures and page layouts were provisionally finalised, I’ve since made some changes and am considering making more. The advantage of testing on a real server with a substantial number of pages is that it highlights potential problems that weren’t obvious with just a few test pages. Now that I’ve imported a few hundred pages and want to do things with them, I can see that some things are unnecessarily difficult.

Counties

In the last post I’d already imported a selection of authors. The next big batch after that was counties of England, Scotland and Wales. What counts as a county isn’t straightforward. The entities that I’ve imported are in three groups:

  • the shires. These are well known and are rigorously defined by the Historic Counties Standard.
  • a county corporate, or county of itself, was a town or city that had the status of a county and was mostly separate from the shire that surrounded it. There’s no definitive list of these, and some sources disagree on them, so it will be very useful to have wiki pages where the evidence can be pulled together.
  • subdivisions of counties and other anomalous areas that had some functions of a county. This includes Berwick, the Isle of Ely, and the ridings of Yorkshire and Lincolnshire.

I’ve imported pages for all of these (the category Counties of Great Britain lists them all). In future there will also be separate pages for each function of each county, such as quarter sessions, sheriff, lieutenancy, and parliamentary constituency. This will make it possible to model civil administration hierarchies more rigorously.

There are also redirects and categories for the Gaelic names of Scottish counties and Welsh names of Welsh counties. This means that if you type a Gaelic or Welsh name in the search box, or click a link in those categories, the wiki will forward you to the correct page, which is stored under its English name. Because of this I found some minor problems with accented characters. First, search suggestions don’t take accents into account, so for example if you type e it won’t match any forms of e with an accent. I can fix this by upgrading the search engine to Cirrus Search, which is what Wikipedia uses. Second, Mediawiki sorts all accented characters after z, which can make pages appear in the wrong order in categories. This can easily be fixed because it’s possible to set a sort key that’s different from the page name, but it means I’ve had to reinstate form and template fields for sort keys that I previously deleted because I thought they wouldn’t be needed.

Battles and Sieges

There are now pages for some battles and sieges, and the places where they happened, and some of the armies and garrisons that took part in them. I started with the Wikipedia categories for battles and sieges of the British Civil Wars, then eliminated some sieges because the Wikipedia pages cover more than one siege of the same place (these will have to be dealt with eventually, but to start with I wanted a selection of easy things that have Wikidata IDs). Once I had a list of Wikipedia page names, I used OpenRefine to get the corresponding Wikidata IDs from Wikipedia’s API, reconcile the IDs with Wikidata, and then extract the dates of battles from Wikidata. After this, I used a spreadsheet to manually edit and expand the data, including setting the wiki page names, adding descriptions, linking to places and armies. The data for armies and garrisons was manually entered in another spreadsheet. Most of these don’t have Wikidata IDs yet.

To create the places, I manually copied them from Ordnance Survey OpenNames CSV files, which I had downloaded last year (when I do more places I’ll download a newer version of the data). The version of the data I used has National Grid references but not WGS84 coordinates, so I had to use the OS API to get WGS84 coordinates for each place ID. I’m hoping that the CSV download will include these in future, as it would save a huge number of API requests when I come to import all the places in Britain.

It’s easy to make mistakes with the record linkage between battles and places, so it’s good that a wiki makes it easy to change things later. Sometimes the name of a battle and the name of the place it took its name from have diverged. Tippermuir is the name of the battle, but the place is now known as Tibbermore. I knew that the battle of Cheriton was also called Alresford, and the OS data only had one place called Alresford, so I copied that without checking further. I later realised that I’d wrongly picked a place in Essex, and that the place near Cheriton is New Alresford.

The way I’ve chosen to represent the locations of battles and sieges is to link them to settlements or buildings that they happened in or near rather than to store coordinates for the event itself. I still like this way of doing it because it can allow for uncertainty (exact locations of some battles are disputed or unknown) and can give a rough idea of the extent of the battlefield without making claims about its boundaries (for example, see the map of Naseby). Now I’ve found some limitations. When I designed this way of doing it, I was mainly thinking about England, but some Scottish battles were in very obscure and remote places. Tullich is not listed as a named populated place by either the Ordnance Survey or Geonames. For this battle, I had to look around for a nearby place that had an external identifier, and settled on Bridge of Gairn, which only gives a very rough idea of where the battle was. The battle of Dalnaspidal has defeated me so far as the name Dalnaspidal is now only used for a lodge and a disused railway station, and there’s nothing else nearby.

I’ve also found that querying for battles and sieges by date is difficult because the date properties for a battle are completely different from the date properties for a siege, and it won’t always be obvious to users whether I’ve modelled an event as a battle or a siege. So I’m considering some drastic changes to the forms and templates to get around this and make editing easier in future.

Books

The next big thing to do is to start importing books. In theory I’m still happy enough with my model for bibliographic data, but I’m less satisfied with how I’ve implemented it in practice because it can lead to lots of superfluous wiki pages and is potentially confusing for users who aren’t me. I’m thinking about collapsing major and minor editions into the same form, with options to represent work level and both edition levels on the same page if there’s only one of each level, and options to link to other pages for each level if there’s more than one.