FactGrid talk:Musical Subscriptions

From FactGrid
Jump to navigation Jump to search

First data input: 770 Titles — authors, publishers, Genres

Dear Martin, dear Simon, the first input has been done - basically with the aim to disentangle the web: You have spotted 156,536 subscriptions on 770 titles. The Titles are on FactGrid with dates and places of publications. I did not yet go into the 490 authors. FactGrid knows more than 9000 composers but the automatic matching worked only on 90 of your 490. (See this Google Sheet - it has our composers, an automatic matching column and your authors). Some of your "authors" happen to be rather poets (they need to be created). Some are anonymous Gentlemen - here you have to decide whether you want to give them database objects or leave the authorship question simply without statement. The items are cool if you can still add more information on these unknown people.

I will create the 400 missing authors any moment with slightly more information, if that should be available or just as place holders if that facilitates things --Olaf Simons (talk) 13:22, 23 October 2021 (CEST)

Upcoming data input: 156,536 subscriptions

The number of subscriptions is massive, leading into big data (under 18th-century standards): 770 titles meet 156,536 subscriptions. With the Q-numbers we can take weight off the spreadsheet which is circulating within the project. Each line is one subscription. The Q-Number in Col. D. states the object on each subscription. Col. A is a running number to preserve the original series.

156,536 / 770 = 203 subscriptions on each title.

It is difficult to determine how many different "people" we should create on FactGrid. The present set gives the entries as they are stated in the various lists. We should note each subscription on the person's item with that original statement (Property:P35 is designed for this) and add a Property:P499 series number so that one can recreate any list.

I played with the data for a day, trying to find out how many double records we might have in this set. Each title is free of doubt, but we apparently do have frequent buyers (like the clearly identified royals) - they should have one item per person, so that we see how much they buy. And we have other recurring entries where are just not supposed to know the individual subscriber with a statement like referring to an unnamed Lady or Gentleman. In these cases we might still use just one item with a particular statement that these might be numerous persons, items that work as placeholders.

Top get an estimate of the number of double records I played with standardisations. Most people have the Mr. before their name. Numerous publications prefer, however, to begin with the family name and to state the Mister after the given name or the initials. Many have further identifiers like place name, social rank, occupation, even employer, yet again their is no regularity in this. ("The Rt. Hon." can also be spelled out as the "Right honourable" or it can have a couple of variations like "The Rt. Honble"). Playing with standardisations here and getting rid of all variables (like the numbers of copies) there - and then asking the spreadsheet to eliminate all double records gives a rough idea of the number of potential double records. I ended at some 122,000 unique entries




I will send you a second spreadsheet with the subscribers. 14 have got numbers (always blue), the rest just provisional placeholders that will preserve the sequence (always red). Cols D and E (green) give the Q numbers for the works subscribed and, for memory, the respective short titles.

Creating data base objects for all Subscribers is theoretically no big issue. We could just give them English Labes with the information of col. C. The reason why I hesitate is that we would not want the numbers or ordered sets in the labels, but

  • "The Musical Society at the Castle Tavern in Pater—Noster—Row" or
  • "Rev. Mr. Dovey, Prebend of the Cathedral at Lichfield"

are very useful labels on all these (so far) more or less anonymous people, choirs and musical societies.

I took al look at the column with the alphabetical order — to get a clearer idea of the number of double records in the entire set. After a few steps of data homogenisation (unifying variants like "Mr." "Mr" "[Mr.]" "&mdash[Mr.]" etc. I got the impression that we might easily have some 1000 to 3000 people appearing again and again in the 770 books. I assume you want to know the frequent buyers. So what we should do is homogenise this list as far as you get in a day's work. We should preserve the actual quote for an input of the literal statements on each publication. But for the labels we should get rid of variants like "The Hon." and "The Honourable". The more unified these entries will appear, the easier it will be to sort them alphabetically and to let the Spreadsheet spot all double or triple appearances of the same people.

Note: the labels are not database information. That will come in statements on the objects. So we should go for labels that will make it easy for others to say: this is the very person I have just come across in my own research. Best. --Olaf Simons (talk) 14:08, 23 October 2021 (CEST)