User talk:Bruno Belhoste
- 1 Batch Fragment
- 2 First mass input
- 3 After the first input
- 4 N.N. for people without given names
- 5 100,000 - congratulations, by the way.
- 6 double entries
- 7 French places
- 8 Media-upload and 250 character limit
- 9 Occupations
- 10 classes and so on
- 11 different languages
- 12 Gender and Len/Den
- 13 Input of Statements
- 14 Aliasses
- 15 la mobilisation générale de...
- 16 Data modeling for biographies
- 17 Merging items
- 18 French Identifyers
Dear Bruno. The following batch fragment might give you an idea ho the input is done to produce just a person. # has to be replaced by the name as it shall appear in headlines, $ has to be replaced by a short bio in French and English. That will be the tough thing. When I have longer lists I just machine generate these bios with birth and death dates, the places.
In the case of your next candidate that would be:
qid,Lfr,Len,Dfr,Den,Aen,P2,P154,P131 ,"Philippe Jacques René de Berstett","Philippe Jacques René de Berstett","*1er octobre 1744 Berstett, +31 mars 1814 Offenburg, lieutenant au régiment de Nassau-Saarbrück en 1768, dernier Stettmeister de Strasbourg en 1789, président de la noblesse de l'Ortenau jusqu'en 1806","1 October 1744 Berstett, +31 March 1814 Offenburg, lieutenant in the Nassau-Saarbrück regiment in 1768, last Stettmeister of Strasbourg in 1789, president of the nobility of Ortenau until 1806","Philipp Jakob Reinhard von Berstett",Q7,Q18,Q99677
First mass input
Dear Bruno - yes, the preparation of the input is the ugly part. My recommendation: Do not spend too much time on the Descriptions (Den, Dfr). If you have just dates and a place, that will be fine. You can overwrite these entries at a later time with information from the database.
The same goes for the names. Hard core name information comes with the P247, P248 entries. The headers (above) are only interesting to get the Q-Numbers for the correlation with your database numbers.
Once you know which number on your database is which Q-Number here you can do all the inputs step by step.
The following two searches give you all the given names and all the family names that are on the database right know:
I know how to use Excel to replace existing names with the respective Q-numbers if that helps - but there are other programs that do the same thing. Step 2 is then to create the new names (will be quite some with the first bigger French input). I can create these names for you, it is easy. Things get easier with every item the systems is taught to use. --Olaf Simons (talk) 17:12, 30 November 2019 (CET) (signatures with dates and links are, by the way, left on these pages such as this one with four ~~~~)
After the first input
I have started uploading data about persons. It works well but I have some questions about what I have to insert.
1. about Alias. How can I insert 'Alias' in the system? In fact I think of two kinds of alias: the first one is alias or nicknames like : "duc de Lauzun" for "Armand Louis de Gontaut Biron"; the second one is the name of the person in a different order like : "Gontaut-Biron (de), Armand Louis" for "Armand Louis de Gontaut Biron". Does it comply with the general principles of Factgrid? It seems to me that it would be most convenient for searching a name.
2. At a more general level, I am wondering whether it is better to put in Factgrid all the persons of Harmonia Universalis (about 5000), or only the people who are directly concerned by the mesmerist mouvement (about 800). I have few information about the other people, who are mainly relatives, but they are important for making the network (I discovered many family connections). If I have the choice, I would put all the persons, but I must be consistent with FactGrid rules.
I have another question about the properties: I noticed that some properties are duplicated in Wikidata and in FactGrid, for instance Date of birth, which is P77 (FactGrid) et P569 (Wikidata)? And what is the meaning of Q80513? I don't understand how I should combine P77 and Q80513.
All the best, Bruno Belhoste
- This is looking very good. As to the Alias: use them as you find practical. I use them for name variants, also to give the long version while I use the short version (There are a lot of people who are not known by their five given namens). But we have not started to set second names in the beginning.
- The second names will be given as statements and then you can do this practical search for all persons of the same family name
- Speaking of "unimportant people" - there are no such things. Give them all and you will see that others can suddenly identify them, so my experience since I am on this project. Stuff that is unimportant is simply not used by others until they make it important by using it. I love the margins of knowledge (and think so do all the others on this site).
- The reference of parallel Wikidata-Numbers is for the future. We are working on the "federation on Wikibase-Instances" in which it will be easy to quote from other instances (like Wikidata or the GND or the BNF). That requires that we know the vocabularies of these instances. Wikidata has the most extended vocabulary so far (yet not so fine on historical documents). The DNB/GND is experimenting with their own first Wikibase installation. I will see them tomorrow. The BNF will decide this month. The software is presently interesting to many people and that is why we are trying to understand the languages of the first installations. --Olaf Simons (talk) 16:07, 2 December 2019 (CET)
- Aliasses are set with Afr, Aen, Ade. If you have more than one separate them with |. You can also set them later. --Olaf Simons (talk) 22:24, 2 December 2019 (CET)
N.N. for people without given names
Dear Bruno. Just a brief note. It might be good to prefix N.N. on people without given names, so that we do not mix them with family name items such as Item:Q27646 --Olaf Simons (talk) 20:19, 2 December 2019 (CET)
- Side remark for Martin Gollasch - we can set German labels and descriptions later in a machine run. --Olaf Simons (talk) 21:27, 2 December 2019 (CET)
- Those were only spot checks, I wanted to get an idea of the "cross border meta data" situation...--Martin Gollasch (talk) 09:53, 3 December 2019 (CET)
100,000 - congratulations, by the way.
- Conrad Alexandre Gérard, Item 100,000 on FactGrid
- Anton Franz Mesmer - merged Item:Q100266 into Item:Q1455.
- Jean-Baptiste Willermoz - merged Item:Q100336 into Item:Q76332
- Arnold Wienholt merged into Item:Q1334--Martin Gollasch (talk) 00:36, 5 January 2020 (CET)
- Emanuel Swedenborg into (Q14227) --Martin Gollasch (talk) 19:26, 15 January 2020 (CET)
- Christoph Willibald Gluck into (Q99773)
Dear Bruno. This search gives you all places that are in the machine right now - on map. Delete "#defaultView:Map" on top of the input frame and you will have them in a list. I guess you will need far more that those. --Olaf Simons (talk) 14:28, 8 December 2019 (CET)
- and you do not have to give coordinates on persons. Just give coordinates to places and the search-input can be written in a way, that the coordinates come with the places. --Olaf Simons (talk) 14:35, 8 December 2019 (CET)
Media-upload and 250 character limit
Dear Bruno. I had a session in Berlin with the software people from Wikimedia and two things might be of immediate interest for you: Properties with a string input as well as descriptions in the header are now open to input of up to 1500 characters (which is nice for title page transcripts).
You can now also upload media files - documents, contemporary engravings. The present wikibase version might not display them on the data sets but that will come with the next version update. --Olaf Simons (talk) 16:07, 17 January 2020 (CET)
What is specifically German in the career of "Arzt"? Would'nt it be better to define "Arzt" as a profession, equivalent to "physician" in English and "médecin" in French?
- Nothing, I confess I am somewhat insecure what to do with the career statements and professions. The colleague who is the expert here is Katrin Moeller in Halle, also a DH specialist. She has a full nomenclature which would work in all languages - and she collected contemporary terms to mark historical changes. I should contact her again. You are free to set foreign labels to all the professions. --Olaf Simons (talk) 14:44, 27 January 2020 (CET)
Q36783 ("painter") is both an instance of "profession" and an instance of "carreer statement". As an instance of "profession", it is also a class in the classification of professions (a subclass of "artistic profession")(and the class "painter" has also subclasses, like "painter miniaturist"). The statement date: "1848" makes sense for "painter as a "carreer statement" but certainly not for "painter" as a class in the professions. To avoid this problem, I propose to put the statement date: "1848" as a qualifier of "painter" as an instance of "carreer statement" (or to delete it).--Bruno Belhoste (talk) 15:48, 29 January 2020 (CET)
- You are right with this. The story behind it is trivial. I started with "Jobs" hoping for the systematic approach and realised I don't have the time to get it done in a way a historian of that field will appreciate. So I just added the bigger word "career statement" to catch almost everything that needs to be dealt with later. If you want to get into the systematic solution, feel free. (Really, we need specialists of different fields working together. Social historians who publish on such questions should work on a solution which other projects could use, because this is really a wide problem for many platforms dealing with posopographical data. It was not such a problem in the days in which we were just writing articles in one language...) --Olaf Simons (talk) 10:41, 11 March 2020 (CET)
Dear Bruno, I put this here: Grossrat should, of course be be Grand councillor or better even member of the Grand council. All the translations of Jobs are machine work. I pushed it into www.deepl.com so that we had less of a problem with them in English and German. They all came from three automatic inputs from German 19th century lists. Before the input I aiming at a professional solution by the expert I knew who has spent part of her life on a system to track professions through the early modern period. Still, I hope she will join the project, although the next conference where we would meet is just about being cancelled because of Corona... Still I am confident we can win her.
So if you do find these mistakes just silently mend them, that is what I am doing whenever I see them.
The brightest thing would probably be a real project on these statements using contemporary dictionaries and adding definitions with the property for that. All this is a project for people in this field, since they can win prestige with work devoted to the system of job-nomenclatures and how they changed on the European map. --Olaf Simons (talk) 10:41, 11 March 2020 (CET) --ok, it makes sense Bruno Belhoste (talk) 11:05, 11 March 2020 (CET)
classes and so on
I have just looked into a presumably simple thing (simple for us who have such titles): academic degrees and permissions to teach. But first a word about the class thing.
Your idea to have a P420 referring downwards and a P421 referring upwards in the class hierarchy - left me with the feeling that you will then need a P422 to actually make a specific class statement.
So we could say (Q22224) professor is (P2) a general career statement (Q37073O).
That is nice because you eventually have a broad basin of career statements in which you will be able to follow trends.
We might now use P422 "class" to state that professor is in the class of "academic titles". Now see the spreadsheet. I moved all these titles to the top to get a clearer idea about what to do with P420 and P421.
The "professor of mathematics" - is more precisely an "academic titel and teaching qualification" (well the simple professor also has a teaching qualification, it is just not stated... so change Q37073O to "academic titel and teaching qualification". You could use your classes to organise these people by faculties (in a world of changing faculties... ugly job but possible).
We have some 89 of these (including composite teaching permissions) (we could split those but would loose historical trends of unions). Do you really want to list some 500 which me might get with a P420 option? I would rather use the P309 property to generate a list of content for a class. The look upwards (P421) is more interesting. Level up you state in which field(s) you are with a specific class (and you can still let the machine do the level down search whenever you want the list.
To entertain you a bit with the mess: we have the "professor of mathematics" - that is a title, a permission to teach and it is probably what the man actually did, his job for a time, a career statement.
What was his university degree? A "habilitation" in Germany. Do we want to state that or is it self evident? Let us say it is self evident in order to have a better life. What is the proper P-Number to state "Professor of mathematics" in a person's data set. Just job? My proposal was employment - employed at to be precise.Here I listed the universities and with qualifiers the positions and added dates for employments). But I also had machine run inputs from lists where I just brought the "career statements" into FactGrid with a P165 statement. This was nice for widows, retired people and so on. I just threw in what they had declared where they were put on a list of registered people. And what if you do not know more? You do not need a university to be a professor by rank, once you have taken the step.
So to sum it up:
- I could live with the binary system of class (P420 [now P422]) and next higher level (P421) because I was ready to use specified database searches for the list of "in this category".
- a good system might be tested with the 89 academic titles items which we have - one can shift and group them quickly, inputs are easy.
- the practical use will depend on test cases of people. My decision to list employments and to qualify positions in the employments is good CV practice but not the thing we can always do with the little data we have on persons.
- I am happy to learn from the lives which you will create and the class systems which you will produce. I avoided classifications (as you realised). --Olaf Simons (talk) 11:58, 29 January 2020 (CET)
Thank you, dear Olaf, for your help. I had a look on the page https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes and I understand now better what the concept of class means in this context. First of all, the concepts of set and group don't exist there. There are only two basic concepts: item and class. An item can be put in a given class by using P2 ("instance of"). The same item can also be a class by using P421 ("subclass of"). That's all. As you explained to me, P420 is useless and can be deleted. Moreover, P422 ("class") is useless too, because it is redundant with P421. In a classification, we need only to define the upper class as an instance of "basic object" (Q22) by using P2. All the other classes in the classification are defined as subclasses of this upper class, directly or indirectly by a chain of "subclass of". Let me take the case of the "FactGrid properties". "FactGrid properties" is an instance of "basic object" and "FactGrid properties for material objects" is an instance of "class of FactGrid properties", which is also an instance of "basic object". That is absolutely correct. However, "FactGrid properties for material objects" is also a subclass of "FactGrid properties". In this case, such a statement is not very useful per se, because there are no sub-subclasses in the classification of "FactGrid properties", but, on the other hand, if you use it, you avoid to create the new item "class of Factgrid properties" and it is probably a better practice. Let me take now the case of (Q22224) professor. It is (P2) a general career statement (Q37073O) but it is also a subclass of (P421) the class "academic titles" and because it is a subclass, it is itself a class (it is the reason why it is not necessary to define Q22224 as a class by using P422 ("class")). I agree completely with your distinction between "job" and "employed at".Bruno Belhoste (talk) 13:50, 29 January 2020 (CET)
- Wikidata does not have to be our hero. The interesting thing about Wikibase its that it is completely triple based. So there is no need to think like you would in a relational database. You can make statements as soon as you feel they could help to give an object its shape and size and smell and relationships or whenever you get the data.
- What is good about all this is that you can generate classification systems later - the statements are not on a new structural level. So no reason to worry. (Same goes for the jobs: If someone has a system to organise this - my hopes lie on Katrin Moeller - we will just import the system as a class system and map items.) --Olaf Simons (talk) 14:13, 29 January 2020 (CET)
- The bottom-up approach is a huge asset of the system, but it's good also to avoid the mess. The concept of class is basic and powerful (for queries) and I still think we'd better follow the rules given by Wikibase, but you know that better than me!--Bruno Belhoste (talk) 14:50, 29 January 2020 (CET)
The problem is that we do not appear together in the same searches...
- have a nice evening --Olaf Simons (talk) 20:51, 31 January 2020 (CET)
Gender and Len/Den
- I will correct the gender thing.. --Olaf Simons (talk) 23:30, 13 February 2020 (CET)
Input of Statements
Dear Bruno, just briefly: If you use a Google spreadsheet for the further input of statements I can help you standardising the columns. The big issue is to match the texts in the fields with the FactGrid Q-Numbers, but I have gained some expertise to help here. --Olaf Simons (talk) 09:44, 24 February 2020 (CET) -- Thank you for this proposition, dear Olaf. I will show you my next input (on a Google spreadsheet) before launching the process. It will be the batch 2500th-3000th person-items.--Bruno Belhoste (talk) 10:51, 24 February 2020 (CET)
la mobilisation générale de...
Dear Bruno, if I were more of a Romantic spirit, I would see the latest input with dark anticipations. The entire French army is mustering at the horizon. Instead I am delighted. Imagine one had the names of all those who met in these units... that would be a great thing. --Olaf Simons (talk) 05:55, 19 May 2020 (CEST) -- I don't like the army either but unfortunately most of my guys are military officers... and it's exactly what happened: they met in their unit. In fact, I needed only some regiments, but I put in (almost) all of them to be exhaustive. --Bruno Belhoste (talk) 12:00, 19 May 2020 (CEST)
Data modeling for biographies
Dear Bruno, dear Martin and dear Germania Sacra team - something we might discuss together:
Dear Olaf, you're right, it's a crucial point, and not easy at all. I will think about it. My own project is on the verge of including many biographical pieces of information involving events, institutions, timespans, locations and so on. -- Bruno Belhoste (talk) 18:43, 22 May 2020 (CEST)
Dear Bruno, before merging items, check with "What links here" which of them has got more links and link - preferably to the one with the biggest number of links, so that it remains easy to correct the links to the new direction. Correcting links is a good practice since SPARQL searches will continue to list the Q-numbers now without labels. (Correcting such links is particularly ugly where they lead to qualifiers... yet it is good that you discover all these double records. I regret I could still not win Katrin Moeller in Halle for the professions.) --Olaf Simons (talk) 11:43, 29 May 2020 (CEST) -- Thank you for the advice. I will do it --Bruno Belhoste (talk) 13:10, 29 May 2020 (CEST)
Sorry, I had not realized that BnF was not working anymore in Factgrid... Do you actually have a preference either in SUDOC or in BnF?--Martin Gollasch (talk) 09:48, 20 July 2020 (CEST) I think Data BnF is the best French reference -- 09:54, 20 July 2020 (CEST) But BnF ID is still working too -- Bruno Belhoste (talk) 09:59, 20 July 2020 (CEST)
- I try to reduce duplicate pages in Factgrid by setting external identifyers. The hardest is the wikidata QNr., which barrs other pages from setting this number. Actually you dont need an explanation for an Item like "Physician" otherwise. If I dont find a GND-identfyer I will set "Data BnF" from now on, especially in Items connected with France.--Martin Gollasch (talk) 10:48, 20 July 2020 (CEST)