Cataloguing SP 28 – By The Sword Linked

Lots of people who have researched the British Civil Wars will know of SP 28, also known as the Commonwealth Exchequer Papers, in The UK National Archives. It’s a very important, and mostly quite poorly catalogued, collection of financial records of the parliamentarian and Protectorate war effort. One of the main aims of this project is to gradually catalogue and index SP 28. I’ve now started importing catalogue data into the wiki.

This is an opportunity to test my data structures for representing archive collections as well as to demonstrate how useful this project could be. I’ve used a mixture of creating pages manually in the wiki, importing batches of data from TNA’s catalogue and importing batches of data from my own notes, which go into more detail than official catalogue entries.

The best place to start is the wiki page for the series SP 28. From there, you can drill down through subseries and pieces. I created the pages for each subseries manually because there aren’t very many of them and the number of catalogue levels varies. Each page includes an external link to the official catalogue record. There’s a link to the parent collection at the top of the page just below the title, and a list of child collections or texts under the ‘Contents’ heading. The contents are generated automatically by a semantic query that finds every page that links to this page via the ‘Has parent’ property. The query results will automatically update if any more links are added in future.

Next I exported data for each piece number from TNA’s catalogue, Discovery. This allows exporting up to 10,000 search results as a comma separated values file (CSV), which is very useful. In the days before Discovery, getting catalogue data meant scraping it out of the HTML code in the web pages, which was much less convenient. To get the data I needed, I just had to do an advanced search for the string “sp 28/”, limited to the series “SP 28” and only pieces, not other levels. The resulting CSV file was easy to rearrange into the CSV files that I use to generate pages. This just involved some drag and drop, and find and replace, in LibreOffice Calc. I didn’t even have to use OpenRefine. Then I ran the usual Python script on the CSV to turn it into wiki XML and imported it.

After the import, I manually edited some of the pages so that pieces that are filed by county are linked to the relevant counties using the property ‘Has main subject’. This means that most English counties will now have some linked sources. For example, these are the query results for Warwickshire.

Some parts of SP 28 have item level descriptions in Discovery, but I’m not importing those yet because I want to do some more testing on the data structures for manuscript texts. At this stage I’m concentrating on testing collections.

There are many other parts of SP 28 that don’t have any catalogue levels below piece, and the catalogue descriptions at piece level can be vague or even completely wrong. For some pieces, I expanded or corrected the descriptions based on my own notes. For the subseries of pay warrants up to the end of 1648 (pieces 1A to 57) I had enough data of my own to add another level of collections below piece level. Although these volumes of pay warrants are not catalogued in much detail in Discovery, they are reasonably well sorted. Each piece covers one or two months, and within it the warrants are sorted by the army or committee that created them, usually with another sequence of miscellaneous documents. There are often paper slips inserted in the volumes at the start of each group giving a brief description and the range of folio numbers covered. Sometimes these slips are inaccurate or completely missing, but even then, the documents are still sorted in the same way. I made my own spreadsheet of the folio numbers and subject of each section by looking at the original volumes (bulk orders made this easy to do) and then used that data to import wiki pages for each group of warrants, linked to the piece that they’re in. For example, the page for SP 28/31 lists its contents, including groups of warrants for the New Model Army. The descriptions for pieces up to SP 28/57 also give the number of boxes, bound volumes, and folios in each piece. I still need to do the warrants after 1648 (pieces 58 to 119), which will need another trip to Kew, but it’s not a high priority at the moment.

There are no pages for individual warrants yet, and it will be a very long time before they’re all catalogued. I will be importing some Army Committee warrants fairly soon to test the data structures for manuscript texts. These will all be payments to saddlers, harness makers, and horse dealers who supplied the New Model Army, taken from my old PhD databases.

But the next thing to do is to overhaul combat events yet again because I’m still not satisfied. I’ve thought of a way to generalise the data structures so that they can potentially cover any kind of event, and store more detailed data about event participants without adding too many extra properties.