This project will do lots of useful things, and will be expandable in future, but it has to be kept within limits to be achievable. This post is a list of some things that will be left until later or not included at all.
Permanently out of scope
- relationships between individual people are already being done by Six Degrees of Francis Bacon (SDFB) so there’s no need to duplicate that. Pages for historical people will link directly to their SDFB IDs. By The Sword Linked will focus more on linking people through organizations that they were members of, which will be modelled in more detail than SDFB groups.
- full name histories of people, places and organizations will not be represented by semantic data, but name changes can be represented by free text and redirects. Modelling name histories is a difficult task that would make the data structures more complicated and create far more work. Cultures of Knowledge is likely to do this better for people and places. Some organizations, such as London livery companies, have stable, well-known names, but 17th-century military units often didn’t have official names at all.
- image hosting: I don’t intend to allow uploads of images to the wiki as this will use more server resources and create extra copyright issues. In the long term, IIIF is probably going to be a better way of hosting images anyway, and it remains to be seen how far this can be integrated with MediaWiki.
- transcription interface: transcripts can be added where copyright allows, but these will have to be created outside the wiki. I have no plans to add a transcription interface or to offer support and training for transcribers because Marine Lives already does these things really well.
- some primary sources won’t be directly represented by semantic wiki pages. Lords Journals and Commons Journals can easily be cited by linking directly to British History Online, and these versions can’t be copied and reused. Calendar of State Papers Domestic and Historical Manuscript Commission Reports will probably be integrated into the wiki pages for the manuscript sources they summarize rather than having their own pages.
- it would be legal to copy and republish all the existing High Court of Admiralty transcripts from Marine Lives, but in practice creating a fork would mean more work for both projects, so it’s better to link to them than copy them.
- SPARQL: it’s possible to use a SPARQL store as a back end for Semantic MediaWiki, which is supposed to make internal queries more efficient as well as making it possible to provide a public SPARQL endpoint. But this would be more difficult to set up and would increase hosting costs. I don’t believe there’s much demand for a SPARQL endpoint because most people have no idea how to use it, and those who do know would probably rather have all of the RDF in a dump file (which I do aim to provide).
- occupations: early-modern occupations are very difficult to model as structured data because descriptors can be vague, and for Londoners it’s often not clear whether they refer to actual trade or company membership, which were not always linked. I hope Cultures of Knowledge will be able to tackle this problem better. In the meantime, information about people’s occupations can be recorded as free text on their wiki pages.
- Academia.edu links: I’m going to be extremely cautious about copyright. Some self-archived articles at Academia.edu may be infringing because it’s a commercial site and authors are required to grant a licence to use the material, which may conflict with their publishing contracts. I will be linking to self-archived articles at personal websites, institutional repositories, and non-commercial repositories such as Humanities Commons.
Coming later
- the first phase will be limited to Great Britain 1642-46 to keep things manageable and because this is where I have lots of information to share. I’ve always intended to extend the project to cover the whole of the British Civil Wars eventually, but I don’t know how soon that will be practical (although famous battles from the 2nd and 3rd civil wars will probably creep in sooner). In the more distant future, the project might expand beyond that, not least because the Thirty Years War is very relevant to the British Civil Wars, but that would create more complications and would need more funding, support and collaboration.
- data about legal cases and administrative events (such as committee meetings): there are no data structures for these yet, but I’ve been thinking about it, and it’s a fairly high priority. I want to test the existing data structures thoroughly before adding more.
- data about images and material objects: I definitely intend to include these eventually as it’s an important aspect of the history of the civil wars, but it’s something I know less about, and I can’t deal with everything at once. It would also potentially increase the number of wiki pages by quite a lot.
- Thomason Tracts: there will be data structures to represent early printed texts right from the start, but I won’t be importing data for all of the Thomason Tracts at once. This is partly because of difficulties with getting the data, and partly because it would lead to a huge number of wiki pages, which the server may or may not be able to cope with.
- there will also be data structures to represent parishes, townships and chapelries, but again I won’t be importing them en masse any time soon. This is mainly because I can’t find a dataset that’s good enough and legally reusable. I’ve decided that settlement names will be sufficient for indexing documents, so adding another 10,000 or so wiki pages to represent civil and ecclesiastical hierarchies is a lower priority.
- wills from TNA PROB 11 can be added one by one when they need to be linked to a person we’re interested in, but they will not to be imported en masse because there are far too many of them. Also I know that TNA are planning a project to improve their will data, so it would be a bad idea to copy the existing data beforehand.
- Text Creation Partnership texts that have been released to the public can be linked to or imported individually where they are particularly relevant, but again they won’t be imported en masse because there are too many of them and I suspect a lot of them are not very relevant.
I also intend to open up the wiki to contributions from other people, but not straight away. Crowdsourcing is not something to be undertaken lightly and has to be done right. Before I can invite everyone else to edit the wiki, I will need to:
- get other things out of the way so that I have more time to recruit and support volunteers
- test the wiki myself on a bigger scale so I can fix any obvious problems
- import more batches of seed data as it will be easier for other people if a lot of the pages are already there and don’t need to be created from scratch
- decide on rules for contributors and define the scope of the project more precisely
- write more help and documentation
- improve the editing interface
- be prepared to move to a more powerful server if it can’t cope with more concurrent editors or it gets too many pages (but I don’t really know how much the server will be able to take)