As it’s the start of a new year, this post is a review of progress so far and a rough plan for the coming year.
The blog and the wiki have been fairly quiet because I’m still busy transcribing petitions for The Power of Petitioning. One good effect of that is that later this year, British History Online will publish transcripts of 400 petitions from the State Papers and several hundred more from the House of Lords, all free to view. Once I’ve finished this work I should have more time for the wiki. This is where I’ve got to and where I hope to go…
I first had the idea for this project about four years ago, although it grew out of lots of previous projects that I worked on (see this post). For the next two years, it consisted entirely of a growing collection of notes scribbled on paper. Two years ago I got these notes into order, typed up the ones that were still relevant (my ideas kept changing) and installed Semantic MediaWiki on localhost to start working on a demo. Just under a year ago, I started this blog and announced the project. From then on, you can read about progress in the blog archives. The most important milestone was when I launched the public wiki on 18 March 2019. Since then there has been a lot of testing and revision of the data structures. In the last post, on 25 October 2019, I announced the import of Army Committee pay warrants to test the data structures for manuscript texts.
Since I last posted I’ve mostly been busy with transcription work, but I have made some more changes. In November, I imported pages to represent TNA, SP 28/140/3. This is an account book of supplies delivered into the Ordnance Office. It has a separate wiki page to represent each physical page of the manuscript. This helped me to test the data structures for manuscripts that are divided into sections. Each page has extracts of entries relating to saddles and horse harness for the New Model Army. These entries are linked to the people who supplied them, so in some cases the linked sources for these people will show you the account of delivery and warrant for payment relating to the same transaction. This helps to demonstrate what a powerful tool Semantic MediaWiki can be for linking historical records.
There have been some minor changes to data structures and editing forms. There is now a separate form for entity type definitions, which I think is more convenient and more rigorous than the previous practice of lumping entity types in with other miscellaneous property values that are not entity types. The property published as part of now allows multiple values where it’s used to link a book edition to a series, as I found that in a few cases the same edition of a book can be part of more than one series at the same time. A thesis can now have a DOI.
Today I imported a batch of authors, taking the total to nearly 400. Most of these are linked to Wikidata IDs. The next step is to start importing authors who don’t already have Wikidata IDs. I have over 500 more authors reconciled with VIAF IDs waiting to be imported to Wikidata when I have time. Then there’s another batch that I couldn’t get unambiguous VIAF IDs for. I’m going through them to see if they can be matched to other IDs. If they can, they will also be imported to Wikidata. If not, they can go straight into the By The Sword Linked wiki without Wikidata IDs. I’ve found OpenRefine very useful for all this as it can easily manipulate large datasets, semi-automatically reconcile them against external identifiers (but it’s very important to check the matches manually to make sure they’re correct and unambiguous), and import batches of data to Wikidata.
Once I’ve added all the authors I need, I can test the data structures for books, articles, theses and serial publications on a bigger scale. When I’m as satisfied as I can be with all the data structures, I can import bigger batches of data for people, places and units (more details of plans for that in future posts). I hope that by the end of this year, the wiki will have tens of thousands of pages and will be a useful resource in its own right rather than a demo that has potential.