FactGrid talk:Buildings data model

From FactGrid
Jump to navigation Jump to search

Bruno Belhoste

Discussion moved from User talk:Bruno Belhoste#Houses.

Dear Daniel,

As you know, data modelling is a matter of philosophy. Your data model is nice, but it needs three special new properties. This may make the user's task easier, but it has the disadvantage of multiplying the number of properties. In my opinion, this is not desirable because it is almost impossible to create an ontology of properties with WikiBase. In case you create special properties for addresses, why not to create special properties for everything else, and finally you will end up with the huge bunch of properties of Wikidata: a nightmare.

In the case of addresses, if you have a unique very simple property like P47 for the location of houses, it is sufficient to add the triple : ?street wdt:P2 ? wd:Q147195 to the triple : ?address wdt:P47 ?street to get the street of the address. It is one line more, but one property less, and I think it is preferable.

For the number of the house, I would simply add a qualifier P90 (number as a string) to the statement P47->Qid where Qid is a street and in queries: ?address p:P47 Qid [ps:P47 Qid, pq:P90 ?number] instead of: ?address wdt:P522 Qid; wdt:P152 ?number.

As I said at the beginning, this is a matter of philosophy. However, there is another objection to your data modelling: there are already many items using the data modelling with P47 (for Paris, Leipzig, Gotha) and it would be reasonable to follow the same pattern for your own project. --Bruno Belhoste (talk) 14:06, 7 May 2024 (CEST)

Dear Bruno, thanks for the reply. I understand your point of view in case of addresses. It is really a matter of philosophy. However, I have also one: hierarchical structure. That means that if we have statement location:quarter, we do not need location:city, because we can make a query. It is one line more in query, but one statement less. I think if we put all (street, quarter, city) to P47 (location), we actually create redundancy. So I think that more properties are preferable to redundancy.
Also, there is quite serious problem with housenumbers (P152 exists for a long time). For example this house had number 56 until 1920, when the Jewish Quarter had status of autonomous municipality. However, when it was merged with the "Christian" town, the houses in the Jewish Quarter had got new numbers (612 in our case). However, I cannot add a qualifier to a qualifier. --Daniel Baránek (talk) 14:36, 7 May 2024 (CEST)
Dear Daniel, I agree that the change of house number in time is a serious objection against the use of qualifier. I had the same problem of house number change with 19th century Paris, but in this case the number is inside the label (see for instance Item:Q385847), which means that you can have two items in the database for the same house, depending on the period of time (with statements using P6 and P7 between them). I am interested in a better solution. Unfortunately your data model does not solve my problem because buildings in Paris do not have specific names and are only identified by their numbers, which can change over time (by the way, this is a difficult problem for historians who want to identify a place in a street by its number in 19th century Paris). In your case, the way of doing with qualifiers would be two different statements for the street with two different qualifiers for the numbers and for the dates. This solution is not very elegant but quite common in FactGrid for many cases.
On the other hand, I don't think that there is redundancy when you only use P47. In fact, as you say, quarter and city are not necessary in P47-statements. They are put only for the sake of convenience to make item pages more readable and queries easier and can be removed without any problem.--Bruno Belhoste (talk) 15:59, 7 May 2024 (CEST)


Dear Bruno, I had a similar Problem for Aschersleben and Quedlinburg, for Aschersleben I found this solution to late but for Quedlinburg I solved the numbering problem by putting them in the alias like commonly in other projects; and over P34 with the different names (so street and number) and combined qualifiers for dates and House nr system (P646) to be able to navigate between the different numbering systems; for a small town like Quedlinburg with only one change of the numbering system in 19th century that worked pretty well to keep different numbers per house on one item; now idea how handy that may be for Paris. Here an example: Item:Q896272 David Löblich (talk) 16:14, 7 May 2024 (CEST)

Laurenz Stapf

Copied from User talk:Laurenz Stapf#Houses. --Daniel Baránek (talk) 14:38, 7 May 2024 (CEST)

Dear Daniel, I think, the structure in the Leipzig-Data is even a direct herity of the structure for Paris (item:Q314208). I would be glad with a split. Actually, the established mixed use of property:P47 in our Leipzig-Data could afford an upgrade. Further my dear colleague user:David Löblich had build it for Aschersleben (item:Q497317), with view on the neighborhoods, about a property:P8-workflow, good for easy bottom-up query-pipes. Our student-workshop next month could create some further thoughts about defining neighborhood in Leipzig. Your Properties are looking easy-applicable for that. See: user:Florian Linsel user:Olaf Simons My latest mode was to extract "Streets" out of the location (look here: Locations per Street). We would have to fix it for Leipzig for around 4000 "addresses".  –-Laurenz Stapf (talk) 13:34, 7 May 2024 (CEST)

Dear Bruno, Dear Daniel, I added an example from the Leipzig-Data item:Q482685, for which we experimented with some more ways to visualize the House on a map. Even cadastral numberance got a place with address names property:P153 because, like older "sounding" housenames, it's a more static identifier of an House than depending numbering regimes. Because "naming and cadastral numbering" is a static qualifier of an House, not necessarily bound with an street, I prefer to hold the item in the first row, adding more qualifiers like duration of use etc. To present the continuity of "place" up to present day, i.e. split and combination of real estates, we didn't have finally decided, depending on the less necessity and uncertainty in the limited sources.––Laurenz Stapf (talk) 15:35, 7 May 2024 (CEST)

Olaf Simons

I can understand the different approaches. We started with many specialised properties and we are moving towards fewer Properties with more statements on the items that can then balance the lack of complexity. The advantage of fewer properties is

  • simplicity during input: Use just one property on a wide range of objects but put complex statements on the objects and thanks to SPARQL you can grab any range in this ensemble. Think of a market place in your town, should you not have an extra property for that? One Property for all just does the job, you can have all the houses on a street or market as in Item:Q15440. I could actually do without the location Gotha because "Gotha, Mönchelsgasse" has again this location.
  • Simplicity in searches - all the addresses on Mönchelsgasse - you get these addresses on P47 just as you would get them on the new Property:P522 - the same collection, so why an additional property?

The advantage of special properties lies, I think mostly in the precision with which we can run a fast check on missing statements which we want to make sure we have. Wikidata's P31 (the equivalent of our P2) is messy when it comes to places. No one has an idea of the ontology, it is hard to guess which Q### will actually grab all human settlements of the Czech republic because the subcategories are not very controlled (and they are actually very difficult to control). Here we are easier to grab the selection with one basic P-Q-statement P2-Item:Q8 and with minor properties that catch various aspects of Q8-settlements, although a strict ontology would do this very job. The strict ontology of subcategories is a philosophical impossibility. We do not live in a cosmos ruled by Aristotle and Item:Q550783. Our cosmos is more one of various aspects we see on things in various contexts.

My philosophical question is here: is being a house in a particular street really that much different from being a house in a city quarter and being that house in a street. Here I can see that one property for location and complex statements on the locations will do my job better than properties that will always miss nuances. The street property will always look awkward on a market place address and you do not know whether you want a street/ place/ crescent Property to be better in this field. Bruno's second argument is not to be neglected: having full models on Paris and Leipzig we need to know the advantage of the additional Properties - the searches that will give. --Olaf Simons (talk) 16:11, 7 May 2024 (CEST)

P152 "house name" (not "house number")

Property:P152 is a special thing and I am no longer happy with it. I would use Property:P34 if I had to give it a fresh start. I created this property because Gotha used to number their houses from 1 to 1300 in various systems. I can often hardly say whether and where these numbers were stable, sometimes they are sometimes they are not. What I can say is that house names are strangely stable, but I hardly got them them in the three registers which I exploited for Goth's houses. I would not put house numbers on them and I think it is good practice to connect house numbers with street names wherever that is the new practice. --Olaf Simons (talk) 17:25, 7 May 2024 (CEST)

P2 - Item:Q16200 "Real estate"/ "Liegenschaft"

I would avoid the identification of buildings. The continuity of buildings is questionable. The continuity of a lot, the space on which the building is erected seemed preferable in my eyes. --17:25, 7 May 2024 (CEST)

Compatibility

Many thanks to all of you for you views and experience. Before I try to summarize how to continue, I will mention some more aspects and my specific needs.

I develope this application: https://map.kehillot.eu . For now, it uses static (exported data), however I want to make it dynamic. For this, I need to match the data from OpenHistoricalMap with FactGrid. OHM uses XML structure, with keys for city, neighbourhood, street and house number. The easies would be to have a special property for each of these keys. However it is not necessary and matching can be done also in several other ways. However, I still need a property for house number. The string (like in Q15440) is not sufficient for many reasons.

One of the reason is that in Austrian and Czech numbering system, houses in cities have actually two numbers: Konskriptionnummer - number in a cadastral area which changes only if the cadastres are merged, and street numbers - which can change quite "often". If we have a string, how can we people and (maybe more importantly) machines know, which number it is?

I understand the reason for "flat" modelling (P2:Q8). However, I still need to distinguish the places with the same name. I can have Lepzig, Innerstadt Lepzig etc. Is there any modelling how to distinguish it? Labels or string addresses are for us, humans, not for SPARQL and machine processing. --Daniel Baránek (talk) 18:16, 7 May 2024 (CEST)

Looking at your web project I see the practical reasons, and I think that FactGrid should help researchers to do their job without complications. Additional properties do not harm existing projects. In the worst case they create ambiguity for people entering and retrieving data whilst they can create islands for projects that leave the common language. As to the House-Number property I should sleep over it. I could move the existing house names to P34, so that you would be free to do with P152 whatever you want. We have a regular number property on Property:P90 and we should make sure why we need a split - but that is a matter of definitions which we would have to set.
User:Laurenz Stapf has far more properties in use like the property on (Wikimedia based) document-reproductions. His model of going into apartments is very refined and should become a common standard. If he has additional wishes as he noted we should coordinate these, so that the additional properties will suit all the people.
I see Bruno's point that we can generate any selection also on a limited model. These are visualisations of addresses in Gotha with Bruno's Viewer:
The second is superior as it leaves all the data of people on their items (where I had put the people on houses in my first approach). Here Laurenz went a whole step further with a look at appartments in houses and these decisions should serve as models. --Olaf Simons (talk) 23:51, 7 May 2024 (CEST)