User talk:Charles Faulhaber: Difference between revisions

From FactGrid
Jump to navigation Jump to search
(→‎duplicate: new section)
Line 161: Line 161:
I hope we can get through the above before the money runs out. Having more objects to reconcile against will slow us down.--[[User:Charles Faulhaber|Charles Faulhaber]] ([[User talk:Charles Faulhaber|talk]]) 20:28, 28 April 2022 (CEST)
I hope we can get through the above before the money runs out. Having more objects to reconcile against will slow us down.--[[User:Charles Faulhaber|Charles Faulhaber]] ([[User talk:Charles Faulhaber|talk]]) 20:28, 28 April 2022 (CEST)
::::Well, it is the best solution for me. We should all learn how to do things with Open Refine. So let the expertise which you are gaining, flow. People will be eager to learn from you. --[[User:Olaf Simons|Olaf Simons]] ([[User talk:Olaf Simons|talk]]) 21:28, 28 April 2022 (CEST)
::::Well, it is the best solution for me. We should all learn how to do things with Open Refine. So let the expertise which you are gaining, flow. People will be eager to learn from you. --[[User:Olaf Simons|Olaf Simons]] ([[User talk:Olaf Simons|talk]]) 21:28, 28 April 2022 (CEST)
== duplicate ==
Dear Charles, not sure whether these have different connotations in Spanish Q400622 / Q23209 (if not we should merge to the lower number) --[[User:Olaf Simons|Olaf Simons]] ([[User talk:Olaf Simons|talk]]) 08:02, 3 May 2022 (CEST)

Revision as of 07:02, 3 May 2022

All FactGrid users

June 2019 First Edits

Dear Charles,

Here are the exemplary items you might try to refine:

There are probably some statements I would try to use if it was my project, especially the Property:P233/+Property:P234 option to create a stemma. This is something I am not quite sure how it will work out. If you talk to computer people they might find that easy. Property:P233 states the previous position(s) in a genetic relationship and Property:P234 is a property to qualify the respective genealogical link like: translation, abbreviation and so on. We manages to get network graphs with these two properties but I'd love to ave a visualisation that exploits the P106 date marker to create the tree...

There are other issues I did not try to solve. The Property:P476 identifier should be able to link straight through into the dataset. You can configure it if you know more about how your software is creating these dynamic links. (That is also the reason why I did not create the "Uniform Title IDno" - you people will know better how to create links into your database.

I also did not go into details with the two texts of the manuscript. Item:Q164746 has some blanks "Input here" that just indicate we could do it, but I did not create the respective items because that would open ever new boxes. The language item "Castellano" might need specification (Spanish would also have been available, but I guess you have more specific terms for different historical periods like we have in German with Middle High German, or Middle Low German).

Best Wishes --Olaf Simons (talk) 13:42, 27 May 2020 (CEST)

Thanks, Olaf,

I'm still trying to figure out how to identify Properties to use as examples in the NEH proposal.

I will use this with you rather than e-mail....

A question:

Is it possible to copy a Property and then make changes in it. For example, copy P329 "Holding archive" to create another P with the label "Holding library"?

Distinguishing Qs at the model level

Hi Olaf,

I understand "Goethe (Q5879) author (P21) Faust (Q29478)"

How do I express this in terms of a data label, linking an entity Q author (P21) to Q ext"?

How about entity human (Q5) author (P21) Q ???


I was looking at the "New Property" page. I think that we are going to have to create a lot of new properties. Will they all have to be debated and discussed before they can be added?

Thanks again. This is fun,

Charles

The statement above is the other way around: Faust has properties and one is that Goethe is its author. See Item:Q3828 for a typical letter. (We did not yet go any deeper into books). Look at our Reasonator interpretation to see why this makes sense.
You can state Faust on Goethe's side with Property:P174 works published. But we did not start with this because it would kill the Items. Some People have written some 800 letters - if we state them on the person's item that will not be beautiful, especially since we will travel around with a messy subset of qualifiers: Goethe is the author of the letter x, he sent that to Y, Y lived in Z, he sent that Letter on date B... So we put all this information on the objects and leave it to searches to tell us what a person has written. [all the object that have J.J.C. Bode as author (some 800)
Item:Q7 Human you need to state that someone is a human being - like a category in previous datebases. Author is mostly used as in career statements.
If you want to feed bigger masses into FactGrid, I would recommend to do some five test items of each sort and look at the searches they are supposed to answer. The good data model answers well on searches. On this course we will create most of the Properties needed. You can create your own properties any time but I recommend to do that only after a couple of items we created together. There are stupid mistakes one can make like not fully consider the data-type. It is also good practice to state new properties on the common page so that the others can see them and begin to use them in their own research. --Olaf Simons (talk) 08:13, 31 May 2020 (CEST)

Hi Olaf,

I did finally figure out, with help from a friend in Spain that the proper relationship is work P50 author/ written by: Libro de buen amor (Q2283127) written by (P50) Juan Ruiz (Q434597). In fact, this is how we do it in PhiloBiblon, where the basic entity is the Work, to which authors, translators, and other Associated Persons are linked.

I don't intend to create anything, neither properties nor classes, for a good long time!

I will talk to you with Jens on Tuesday

1st Web-Meeting, 18 May 2021

Moved to FactGrid talk:PhiloBiblon

Database Objects and pages

The following is no database object - it is a Wiki page with the Q-Number of an object/Item:

https://database.factgrid.de/wiki/Q254471

To create an object you need to run the New Item procedure (Menu (which you did)). Better even: run that routine with a batch Fragment.

This is the database object - It has "Item" in the url:

https://database.factgrid.de/wiki/Item:Q254471

It makes no sense to date things on FactGrid (as you did on the talk page as the system has a version history for anything that is far more precise and specific.

Maybe I can offer an online course on the Database. Useful also a look at the Help section to understand the database routines. (Improvements of the Held section are welcome.) I could give that course for anyone interested Monday 24 May 2021 at on our Webex Channel (easy to use) 21:00 CET.

Good descriptions

Dear Charles, as I just wrote in my e-mail: If you are using labels like "MS: Toledo: Biblioteca Capitular...." you will get hundreds of hits on the normal navigation that make no difference before character 31. Two ways out of that dilemma: good, (1) intuitive, aliases and (2) descriptions that state what is special in the first 15 characters, the ones you see on display in thin letters, below the bold font labels. --Olaf Simons (talk) 08:19, 26 January 2022 (CET)

Witnesses and Price

Morning Charles, I am not sure whether Item:Q394389 is meant to refer to a person (which we have Q144912) in a law case or to a textual witness (Item:Q195274)? (The plural is odd) --Olaf Simons (talk) 08:32, 24 February 2022 (CET)

I also detected these to Item:Q394388, Item:Q246405 which might be the same. --Olaf Simons (talk) 08:55, 24 February 2022 (CET)

We need both textual witnesses Item:Q195274 and witnesses as individuals Q144912, although ours are usually witnesses of documents, not witnesses in a law case. In the former case, each record in our Analytic table is a textual witness. In the latter case, witnesses are found as Persons Associated with a text or with a textual witness. With regard to Item:Q394388 and Item:Q246405, they are different in my opinion. The latter is the actual price paid from one person to another; the former is the price established by law for a book. This is useful to know for book history. --Charles Faulhaber (talk) 19:56, 24 February 2022 (CET)

--I quote the "tasa" from the 1495 edition of Antonio de Nebrija's Latin-Spanish Dictionary: "Esta tassado este vocabulario por los muy altos | & muy poderosos principes el Rey & la Reyna | nuestros señores & por los del su muy alto con|sejo en cinco reales de plata."

The price of this vocabulary has been set by the most high and most powerful princes, the King and Queen our lords, and by those of their most high council in 5 silver reales

This is an official government action. It is different from the price one would pay to a bookdealer, where money changes hands. --Charles Faulhaber (talk) 06:09, 3 March 2022 (CET)

Property like Items

Dear Charles, Item:Q394873 - I do not quite see in what kind of statement such items can possibly occur. I do not even know what P2 or P3 statement to give them. Could you put them on an exemplary item? --Olaf Simons (talk) 10:39, 2 March 2022 (CET)

This was based on our discussion 3/1/22 on how to use Literature (P12) to list a Q# of secondary bibliography and then to qualify that Q# with Kind of Reference (P708). You created that property in order not to use Note (P73) followed by a string. You rightly pointed out that if we used "descrito en," we would not see "described in" if we changed the language to English See Item:Q164502.

We will see how we get good wordings on these. So far they sound like properties - and we even have one of the same wording. That is why I was worried whether the information was not supposed to go on the Property:P411. --Olaf Simons (talk) 08:48, 3 March 2022 (CET)

Dear Olaf, it's clear that we are going to have doubles with the same wording in both P# and Q#. We need this flexibility in order to map PhiloBiblon into FactGrid. It is clear that we need a P# for each class (=column head), but then we have to decide whether the individual items in that class should be P# or Q#. We don't think that we can mix them: Either all P# in a given class or all Q#. If we say that we should have all P$ in a given class, then we have to create those P# if they do not already exist. And you have convinced me that multiplying the number of P# is not a good idea: Occam's razor --Charles Faulhaber (talk) 07:09, 4 March 2022 (CET)

Location of text in MS or edition

PhiloBiblon has a controlled vocabulary term ANALYTIC*TEXT_LOCCLASS to indicate the location in a manuscript or edition where a text is found.

Existing examples (e.g. Item:Q245760) link folio P100 or page P54 as statements directly to the MS or edition.

PhiloBiblon, I think, needs a P# called "Location in MS or edition" to serve as column header in the CSV file to facilitate mapping to FactGrid.

This would take as a Q# Leaf (Item:Q369869), Page/Pages (Item:Q164536), or Fly Leaf (Item:Q395036).
--Charles Faulhaber (talk) 23:40, 5 March 2022 (CET)

thy miseries of our career statements

Dear Charles. The career statements are terrible so far. We started with an automatic input of German statements from lists. We then ran automatic translations, and here Deepl and Google did not care too much about the complex German spectrum. Thus Hilfsgeistlicher and Kaplan are both Chaplain in English, which is only more or less correct. We have hundred of these. The best solution would be a systematic input of a system of occupations. There are projects that created such systems so that they can then put various language words on that systematic grid. I contacted a project in Cambridge which Christine Philliou's team had picked as an interesting partner, but the contact died down. Katrin Moeller in Halle is on a similar project but does not feel the ready to make her system open source already.

So: a difficult situation. Look at all the languages of an Item to decide whether these things are really double records (mere spelling variants for instance like German Lakai and Laqai) or whether they are so far not translated with the necessary nuances. --Olaf Simons (talk) 20:53, 26 March 2022 (CET)

Hi Olaf, if the two chaplains correspond to two different original understandings, then by all means they should be kept. I note that you make a distinction in German between Kaplan and Hilfsgeistlicher. For PhiloBiblon we'll use the latter Chaplain (Q43655). --Charles Faulhaber (talk) 21:17, 26 March 2022 (CET)

P73 notes

Dear Charles, The most important parameter is the present 1500 character limit. Both your strings are shorter and did not resist my input on the sandbox-Item: Item:Q24. Why could I do it and you failed? I can only guess. My most common mistake is an accidental blank space in the end. I am also not sure whether it would swallow tabs. But generally I am trying to use the property very moderately as I realised that things get messy after three 1500 character notes. So test it again and watch out for blanks in the end. --Olaf Simons (talk) 08:35, 4 April 2022 (CEST)

The authors

Dear Charles, I am wondering about the authors. Would it make sense to import all Wikidata's Spanish authors 700 to 1900, just to have them with external identifiers and basic data?

Would there be specific external identifiers that would make it easy for you to map such an import against your authors? --Olaf Simons (talk) 22:57, 26 April 2022 (CEST)

The Advantage of the Input from Wikidata would be that all the place references would come with easy matching - since we have all the places with Wikidata references. So that could make the entire matching process a bit easier - if persons were easy/easier to match for you. --Olaf Simons (talk) 07:53, 27 April 2022 (CEST)

From Max: Adding q-items for authors doesn't help us in the short run - it actually makes our task more difficult. We have been operating under the assumption that we don't need to reconcile any of the base objects except geo. That is, we intend to create all the base objects (except geo) by script without checking first to see whether there is an existing object that we could/should use. So creating a bunch of author objects from a different source would mean that we need to be more careful when creating bio objects.

However, given that we expect work on PB to slow down significantly when the money runs out during the summer, I believe we will choose not to create the bulk of the PB base objects in FG in the summer time frame. Our current plan is:

Develop OpenRefine schemas against FG live (as Olaf has agreed), using a small subset of the base objects
Do a dump of the existing properties in FG along with all q-objects that have a PhiloBiblon ID (P476) -- PB base objects and dataclip objects
Load those objects into a beefy, more durable sandbox
Run OpenRefine using the full CSV dumps and the schemas we developed in step 1 -- to get a real sense of the dirty data problems we will (almost certainly) encounter and of the truly "problematic" parts of the PB schema

I hope we can get through the above before the money runs out. Having more objects to reconcile against will slow us down.--Charles Faulhaber (talk) 20:28, 28 April 2022 (CEST)

Well, it is the best solution for me. We should all learn how to do things with Open Refine. So let the expertise which you are gaining, flow. People will be eager to learn from you. --Olaf Simons (talk) 21:28, 28 April 2022 (CEST)

duplicate

Dear Charles, not sure whether these have different connotations in Spanish Q400622 / Q23209 (if not we should merge to the lower number) --Olaf Simons (talk) 08:02, 3 May 2022 (CEST)