After long delays because I’ve been too busy, I’ve finally found a week to get back to this project. This hasn’t led to much new content, but I’ve made some changes behind the scenes which will make things easier in future.
- the biggest change is that the licence is now Creative Commons Attribution ShareAlike (CC-BY-SA). This will stop big companies from reusing my data in paywalled resources (not that they had been, and I doubt that they would think of it, but I want to be sure) and will increase the range of sources that I can import data from. The main disadvantage is that it will stop some other projects from reusing my data because their licences aren’t compatible.
- new property for historical people: Spelt own name. This records how they signed their names. Now page names have less need to reflect the original spelling, which is more flexible. Could also be useful to people studying literacy.
- new properties for serial publications: Contents last updated and New issues expected in. These will make it easier to keep track of whether journal contents are up to date.
- Collections can now have different values for Instance of. This makes it easier to include or exclude series of pay warrants when querying for sources, and will provide a way of indexing indemnity cases in SP 24 in future.
- forms and templates have been overhauled. Some of this will make the pages better structured, but a lot of it is behind the scenes and won’t make an obvious difference to reading pages or querying the data.
- removed spurious precision from WGS84 coordinates. They should now only have 5 decimal places, which is accurate to 1 metre.
- maps have been moved out of the main namespace. Pages that used to show a map now usually show a link that you can follow to view the map. This will make pages less cluttered and should save some resources on both the server and client sides. It’s also a possible workaround for a bug that stopped maps from displaying properly if the Litespeed cache was enabled, although the server load is so low at the moment that I don’t need a cache.
- the hierarchy of subject headings has been rearranged.
- units that never move can have a permanent location set instead of a repeatable template.
- Churches and cathedrals are now types in their own right. I’m still not planning to add any more specific types for buildings because defining a house or a castle or a fortification is so difficult.
- ID properties for Early Modern Letters Online and Six Degrees of Francis Bacon have been removed because Wikidata is already a spine for these and there’s no need to duplicate them.
- fixed some broken links.
- the Github repository has moved. This is partly because of the new licence, and partly to make it easier to share other data that isn’t wiki dumps but is still related to the project.
Things have been quiet here for longer than I expected because I’m busy earning money, which is the best thing to do at a time like this. Meanwhile, someone else’s project needs help, and it will also help By The Sword Linked in the long term.
Index Villaris is a project to create freely reusable geodata from a directory of places in England and Wales printed in 1680. The printed book lists 24,000 settlements along with their latitude and longitude, and the county, hundred and rural deanery they were in. This data will obviously be very valuable for By The Sword Linked and for many other things. It will allow me to:
- add every English settlement that existed. Currently I have 12,000 wiki pages for settlements that had administrative units named after them, and an offline list of another 1,000 names of units that I haven’t yet been able to match to settlements.
- add Welsh settlements more easily. So far I haven’t tried to tackle Wales.
- get more accurate coordinates for the locations of settlements in the 17th century. My existing coordinates are taken from Ordnance Survey data showing where the settlement is now. Some will have moved because of landscaped parks, coastal erosion or other reasons.
- link settlements to hundreds and deaneries. This will make my current practice of using settlements as proxies for parishes and townships more effective, and would make it easier to add parishes in future.
An alpha version of the data has already been released. The project team have been able to identify and locate 95% of the settlements in the list, but they need help with the other 5%. The instructions page gives details of how to help. You can do the task in a web browser on a computer. I’ve found it easy to use and have been able to offer a few suggestions. The unidentified settlements are often misplaced on the map, which is why the correct identification wasn’t always obvious. Sometimes this is just an error in the coordinates in the original printed edition. For example, Shenley Brook End and Warrington (both in Buckinghamshire) were listed under the correct county and hundred but had the wrong coordinates printed. They were both easy to correct because they are fairly well-known places. Printed coordinates of some places are so wrong that they appear in the wrong county or even in the sea! In other cases, the printed counties or hundreds may be wrong, and the place names may use archaic phonetic spellings. Some settlements may be very small and obscure. That’s why more people with detailed knowledge of local history are needed to help.
The wiki now has pages for ecclesiastical units in England and Wales. You can drill down from the Church of England to find provinces, dioceses, archdeaconries and rural deaneries. The relationships in this hierarchy are all referenced to Ecton’s Liber Valorum with links to page images at the Internet Archive. Parishes in London and Southwark will be added within the next few weeks. Parishes in other towns and cities that were divided into more than one parish will follow eventually, but I don’t know how long it will take. I have no plans to import rural parishes because I think settlements will be adequate for record linkage.
From now on I’m going to keep importing as much data as I can but I will also stop promoting the project for a long time because it’s still not ready to get much attention, and trying to keep people interested is too much of a distraction for me. I would rather do things in the order and at a pace that’s most convenient for me. This means that I won’t be posting much on this blog unless I want to share something especially important, and I’ve deactivated the project’s Twitter account. I hope that I’ll be ready for a big relaunch in the autumn of 2022, but it might take even longer if unexpected things happen. Meanwhile, the wiki and Github repository will still be available and will still be updated every so often.
Covid interferes with everything eventually but it has only interfered with this project in a very indirect way: I needed new glasses and I waited to get vaccinated before booking an eye test. Then when I got new glasses, I had to get used to varifocals. That’s all out of the way, and I’m making progress with the wiki again.
First of all, I’ve imported a few hundred more pages. These are mostly streets and buildings in London. We also have a page for every cathedral in England and Wales.
Some other changes that have been made in the last few months: Continue reading
One of the advantages of doing this project on my own with no funding or institutional support is that there are no deadlines. A disadvantage is that working on the project has to come third behind paid work and peer-reviewed publications. That’s why progress has been so inconsistent in the past year – it’s nothing to do with corona virus, because this is a project that I can do from home at the moment. In the first half of 2020 I made huge progress and got the size of the wiki to more than 20,000 pages (mostly settlements in England and Scotland). Then it stopped as other things got in the way. For the last seven months there have only been occasional manual edits, and long periods with apparently nothing happening. But I have been doing a lot of offline work that isn’t visible on the wiki.
The biggest task that I’m working on is preparing to import the Propositions lists from SP 28/131. These are three account books (one doesn’t even have a page yet) that list people’s names, addresses, and occupations, and details of the horses and arms that they loaned to Parliament in 1642 and 1643. This is an amazing source that has been a big part of my research on horse supply, and I want other people to be able to use it more easily. Among other things, it allowed me to identify Davy the horseman from Nehemiah Wharton’s letters, and even find the colour of his horse! Although I’ve made a lot of progress with record linkage, it still needs a lot more work before I can import it, and I don’t know when I’ll find the time because I’m lucky enough to have a good amount of paid work lined up, and a potential journal article that I can finish writing without any archive trips. Because of this, I might use any spare time I get to work on smaller, easier tasks that I can finish quickly, so at least something will be happening on the wiki, even if it’s not the most exciting thing. I’m deliberately being vague about what these tasks might be because it’s hard to predict what will be easiest to do in the time I’ve got.
Meanwhile, working on record linkage for the Propositions lists has led to a little bit of visible progress. Today I imported about 40 English settlements that were previously missing. The latest dumps of wikitext pages and RDF have been uploaded to Github.
I’ve been working on importing wiki pages for settlements in England. This post follows on from the one about importing Scottish places and will refer back to that instead of repeating all the details, but for England some things will be different.
Every time I think I’ve finalised the data structures of the semantic wiki, I realise that something could be done better or that querying for a certain thing is too difficult. This week I’ve given the wiki a big overhaul, including:
- upgraded software to Semantic MediaWiki 3.1.6, Maps 7.18.0 and Page Forms 4.9. As far as I know, this hasn’t broken anything yet.
- imported about 750 regicides and MPs who were never peers. See Category:Agents for the latest list of historical people. Regicides are also listed in Category:Regicides of Charles I and MPs have personnel relationships with the House of Commons in the Short and/or Long Parliament. These imports are based on Wikidata items that I found via Wikipedia categories. There may be errors that I haven’t found yet.
- pages for historical people, places, organizations, and events now show up to 10 linked sources or a message that there are no linked sources, so you don’t have to click a link to find out.
- changed the properties that subobjects use to link agents to organizations, and participants to events, so that they’re not the same ‘has parent’ and ‘has subordinate’ properties used for command structure relationships. I think this should make queries simpler and more efficient, and avoid possible confusion, because there’s less need to check what the subobject is an instance of.
- added a new property, ‘has allegiance‘, to the subobject for event participants to show which side they were on.
- agents and units now use different types of subobject (but with all the same properties apart from ‘is instance of’) to link to events. This makes it easier to query for participants and simplifies the roles that need to be assigned to participants.
- roles in events have been redefined. See Category:Event participant roles for the latest list.
- royalist allegiance factions have been merged into one: royalist forces. This is simpler and allows for cases where we know that someone served as a royalist soldier but not when or which king they served.
- where an agent is a member of an organization that can be assigned allegiance, the allegiance faction should also be entered as a second value in the personnel relationship. This makes queries for soldiers by which side they were on simpler and more precise. It also allows for cases where we know that a soldier was on a certain side but not which unit they were in.
- updated Help:Data structures to reflect changes to properties and templates.
- added Geonames IDs to some more Scottish settlements, so coverage is now about 1/3, but linking Geonames and Ordnance Survey data has turned out to be quite difficult.
- fixed some broken redirects and missing categories.
This month I’ve started working on importing large amounts of data for places in Great Britain. I’m writing this blog post as I go, so that I can easily remember what I did. Imports of authors seem like old news now, so I might not write any more detailed posts about how I did it. See below for more details of how I did the geodata. This is a very long post by today’s standards, so the ‘Too Long, Didn’t Read’ version is that I’ve imported pages for:
I’ve finished working on The Power of Petitioning, and the shutdown is giving me plenty of time to work on By The Sword Linked, so here’s a quick summary of where I am and where I’m going.
The biggest news is that there are now wiki pages for 1,238 authors who weren’t alive during the civil wars (see Authors category for a full list). Of these, 1,049 are linked to Wikidata IDs. About 25% of authors imported so far are women. Not ideal but it may be a fairly accurate reflection of who has publications and theses relevant to the British Civil Wars.
With enough authors in place, I’ve been able to import more publications and theses. For example:
- Midland History is a journal with links to about 60 articles. This is all of the articles that I think are relevant to the civil wars. Wikidata seems to have complete coverage of this journal up to 2017, which made the imports easier.
- Helion Century of the Soldier is a series of monographs and edited collections with links to each volume that covers the civil wars. The volume pages have links to the publisher’s website.
- Theses category lists a small selection of theses, mostly recent and mostly by women, which is encouraging for the future.
- Open access category is a quick way to find sources that are free to view online, with subcategories for books, articles and theses.
Now that I’ve tested every type of entity at a big enough scale, I think I’ve finally finalised the data structures, although I can’t rule out minor changes if I come across something that needs fixing.
The current situation hasn’t derailed this project but it has changed my priorities for the future. The news that The National Archives of the UK are closed until further notice makes it especially important to share transcripts of Public Records that I already have copies of. Before I do that, I need to import more people and places to make it easier to link sources to subjects. I expect to be doing that for most of April. Then from May onwards I’ll try to share as much material as I can from SP 28. This will also demonstrate the value of the Open Government Licence. In between doing all that, I might write some more detailed blog posts about how I imported data for authors and publications.
As it’s the start of a new year, this post is a review of progress so far and a rough plan for the coming year. Continue reading