FactGrid talk:Subscription lists

From FactGrid
Revision as of 16:45, 30 July 2024 by Olaf Simons (talk | contribs) (Created page with "== British Music Subscriptions == The present list has 156,536 lines, each a subscription - usually one copy, sometimes more copies. We could create an item per subscription and then create (almost) as many items for the individual subscribers with almost identical information. The alternative would be a process of entity recognition. It would here be interesting to separate the following groups: # Subscriptions that provide enough information to allow a provisional/act...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

British Music Subscriptions

The present list has 156,536 lines, each a subscription - usually one copy, sometimes more copies. We could create an item per subscription and then create (almost) as many items for the individual subscribers with almost identical information. The alternative would be a process of entity recognition. It would here be interesting to separate the following groups:

  1. Subscriptions that provide enough information to allow a provisional/actual identification
  2. Subscriptions that are essentially generic: Y96894: A Lady.

Subscriptions that provide enough information to allow a provisional/actual identification

Most of the subscriptions come with plausible identifiers: A family name, gender (Miss, Mr., Seignor...) a given name, a place name, a profession. Some of the subscribers are organisations or companies (choirs, booksellers), most of them are individuals in the full range from "The King" and "George Frederic Handel, Esq;" to "Lady Brown" (Y3536) who might (or might never ever) become identifiable in the hands of specialists.

It would be good to accumulate and to sort the information before we feed the data into FactGrid. Creating items one by one is painful. Merging items - thousands of items manually - is even more painful. The ideal data set is already an interpretation, though a transparent interpretation with indications of potential "Merger candidates" or with warnings that a separation of data might be necessary.

FactGrid knows 18th-century composers (noted in Wikidata) but the British posopographic dataset is mor or less empty.

The interesting matching process will focus on identities that should be created among the 156,536 subscriptions. The optimal solution is, however, a plausible and transparent set of identities to create under rules such as:

  • 90% match - if names, professions and places are the same within three decades
  • 80% match - if full names and places are the same within three decades
  • 75% match - if last names and places are the same within three decades
  • 70% match - if status, name and place are matches
  • 65% match - if full names and professions are the same within three decades (though places differ)
  • 30%-65% match - "manual" look requested
  • < 29% simple item creation is proposed though with identification of possible matches

"Manual" scans of proposals would be assisted by subsets that present the data sorted by name, place, status or profession (with the respective dates of publication). One could in this case go through all candidates from York or Bath and decide with human knowledge.