FactGrid talk:Musical Subscriptions: Difference between revisions

From FactGrid
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:


== Upcoming data input: 156,536 subscriptions ==
== Upcoming data input: 156,536 subscriptions ==
I will send you a second spreadsheet with the subscribers. 14 have got numbers (always blue), the rest just provisional placeholders that will preserve the sequence (always red). Cols D and E (green) give the Q numbers for the works subscribed and, for memory, the respective short titles.
Creating data base objects for all Subscribers is theoretically no big issue. We could just give them English Labes with the information of col. C. The reason why I hesitate is that we would not want the numbers or ordered sets in the labels, but "The Musical Society at the Castle Tavern in Pater—Noster—Row" or "Rev. Mr. Dovey, Prebend of the Cathedral at Lichfield" is actually quite useful for all these more or less anonymous people.
I took al look at the column with the alphabetical order — to get a clearer idea of the number of double records in the entire set. After a few steps of data homogenisation (unifying variants like "Mr." "Mr" "[Mr.]" "&mdash[Mr.]" etc. I got the impression that we might easily have some 1000 to 3000 people appearing again and again in the 770 books. I assume you want to know the frequent buyers. So what we should do is homogenise this list as far as you get in a day's work. We should preserve the actual quote for an input of the literal statements on each publication. But for the labels we should get rid of variants like "The Hon." and "The Honourable". The more unified these entries will appear, the easier it will be to sort them alphabetically and to let the Spreadsheet spot all double or triple appearances of the same people.
Note: the labels are not database information. That will come in statements on the objects. So we should go for labels that will make it easy for others to say: this is the very person I have just come across in my own research. Best. --[[User:Olaf Simons|Olaf Simons]] ([[User talk:Olaf Simons|talk]]) 14:08, 23 October 2021 (CEST)

Revision as of 14:08, 23 October 2021

First data input: 770 Titles — authors, publishers, Genres

Dear Martin, dear Simon, the first input has been done - basically with the aim to disentangle the web: You have spotted 156,536 subscriptions on 770 titles. The Titles are on FactGrid with dates and places of publications. I did not yet go into the 490 authors. FactGrid knows more than 9000 composers but the automatic matching worked only on 90 of your 490. (See this Google Sheet - it has our composers, an automatic matching column and your authors). Some of your "authors" happen to be rather poets (they need to be created). Some are anonymous Gentlemen - here you have to decide whether you want to give them database objects or leave the authorship question simply without statement. The items are cool if you can still add more information on these unknown people.

I will create the 400 missing authors any moment with slightly more information, if that should be available or just as place holders if that facilitates things --Olaf Simons (talk) 13:22, 23 October 2021 (CEST)

Upcoming data input: 156,536 subscriptions

I will send you a second spreadsheet with the subscribers. 14 have got numbers (always blue), the rest just provisional placeholders that will preserve the sequence (always red). Cols D and E (green) give the Q numbers for the works subscribed and, for memory, the respective short titles.

Creating data base objects for all Subscribers is theoretically no big issue. We could just give them English Labes with the information of col. C. The reason why I hesitate is that we would not want the numbers or ordered sets in the labels, but "The Musical Society at the Castle Tavern in Pater—Noster—Row" or "Rev. Mr. Dovey, Prebend of the Cathedral at Lichfield" is actually quite useful for all these more or less anonymous people.

I took al look at the column with the alphabetical order — to get a clearer idea of the number of double records in the entire set. After a few steps of data homogenisation (unifying variants like "Mr." "Mr" "[Mr.]" "&mdash[Mr.]" etc. I got the impression that we might easily have some 1000 to 3000 people appearing again and again in the 770 books. I assume you want to know the frequent buyers. So what we should do is homogenise this list as far as you get in a day's work. We should preserve the actual quote for an input of the literal statements on each publication. But for the labels we should get rid of variants like "The Hon." and "The Honourable". The more unified these entries will appear, the easier it will be to sort them alphabetically and to let the Spreadsheet spot all double or triple appearances of the same people.

Note: the labels are not database information. That will come in statements on the objects. So we should go for labels that will make it easy for others to say: this is the very person I have just come across in my own research. Best. --Olaf Simons (talk) 14:08, 23 October 2021 (CEST)