Help:How do I feed data into FactGrid?

From FactGrid
Jump to navigation Jump to search

back

https://www.wikidata.org/wiki/Help:QuickStatements

Creating Items one by one

The link to the input mask is given in the left menu:

With this form, you can create any imaginable database object — "Item". Items have Q-numbers and should come with labels in English and with descriptions in at least one language, The English label is necessary as we are running the database on a default English setting for everyone who is not logged in (and hence unable to change languages in the preference menu that comes with the account).

The label creates a link into human language. The description is needed so that you can decide between two items of the same name (e.g. with people of the same name) which one to link to.

Two or three standards have evolved with labels on FactGrid

  • A typical title: John Doe, History does not write itself (New York, 2001).
  • A typical person: Erika Doe (née Musterfrau)
  • A typical letter: Letter Johann Adam Weishaupt to Ernst II. Ludwig von Sachsen-Gotha-Altenburg, Sandersdorf, 1783-10-14

Descriptions for persons follow a standard where we first state birth and death. as these turn out to be good separators between people of the same name: "* 6 Februar 1748 Ingolstadt, + 18 November 1830 Gotha, professor of law, writer and privy councillor, member of the Illuminati under the code names Spartacus and Scipio Aemilianus" - is the brief description for Adam Weishaupt (labels and descriptions have to stay under 250 characters).

Documents are well served with a short summary that will be useful in displays as this one: table of the correspondence of Friedrich Christian Rudorf.

If you have name variants you can fill the header's alias section. The alias will eventually link the item to regular Item searches.

How to make statements on individual Items

Most databases will require you to decide beforehand what kind of objects you will create and how exactly they can be related to each other. Wikibase is structurally open. Items gain meaning just as they do in regular communication through the statements in which they appear.

For the database all its Items are merely Q-numbers, linked with each other with the help of P-numbers. Alternatively an Item can link to a date or a unit. Linking Items (Q-numbers) with the help of Properties (which are just P-numbers) has the advantage that both can now be named and identified in any language imaginable. The Item will not gain its qualities with a categorisation but with the statements in which they appear — statements like "is the father of" or "was born in" will create persons, not houses.

There is no restriction of categorial fields. Anything that can play a role in a statement can become an object or a property.

Triple-based statements

Statements in the FactGrid take the form of triples, three-part atomic statements such as

Johann Christian Bach's (Item:Q147795) father is (Property:P141) Johann Sebastian Bach (Item:Q147798)

The Q—P—Q triple is the most interesting option as it brings data base objects into relative positions towards each other. They have three eminent advantages over the "string" input which most conventional databases require. Unlike the combination of characters "Köln" as the place of birth of a person, the reference to the respective database object Item:Q10400 has three big advantages:

  • the reference introduces an Item that is itself the object of various statements. The string sequence "Köln" does not come with geo-coordinates or external identifiers, but these will be in the package of Item:Q10400.
  • the reference introduces an Item that can be handled in numerous languages, Anglophone readers will read "Cologne" on the Item:Q10400 instead of "Köln".
  • the reference to an item instead of a string brings position into the database. If there are several places with the same name, you will select the Q-number of the object you mean, the one with the right geo-coordinates, the right country, etc.

In addition to Q—P—Q triples, the triples can also refer to dates, quantities, or strings of characters (for example in order to state a particular wording or spelling in a quotation), a URL or the identifier of another database. An alternative triple can thus be:

Johann Christian Bach (Item:Q147795) was born on (Property:P77) September 5, 1735

or

Johann Christian Bach (Item:Q147795) has the GND identification number (Property:P76) 118505521

When creating new properties, on has to consider which type of link they should create.

Statements can be qualified in any complexity

Triples in themselves would be a comparatively rough statements. That is why Wikibase allows the "qualification" of triples with extended statements. Adam Weishaupt was married twice. It makes sense to qualify the entries in more detail. The first marriage is the triple:

Johann Adam Weishaupt (Item:Q1308) was married to (Property:P84) Maria Afra Johanna Walburga Weishaupt (born Sausenhofer) (Item:Q23424)
— on (Property:P49) July 11, 1773
— Place of marriage (Property:P132) Eichstätt (Item:Q10340)
— Best man was (Property:P340) Anton Härtl (Item:Q97362)

When entering such statements manually, it is not necessary to know the respective Q or P numbers. The moment you type into the input fields the database will suggest items you might be thinking of, insofar as they are already in the database. If they are not you have to create these items or you will not be able to get the statement into the open input field.

Statements can be complexly substantiated with references

Any number of source documents can be added to all statements. The most important properties are here:

  • Property:P51 "primary source" — for the historical document from which research literature takes or should take the information,
  • Property:P12 "literature" — for reference to research literature on the subject (qualifiers can be used to add exact page references),
  • Property:P129 "according to" — to build distance from the (doubtful) statement.

Contradicting statements can be made and are valuable

Wikibase instances allow multiple contradicting statements to be made on objects. This is not a defect, but makes sense, for example, as soon as individual documents provide these contradicting statements on an issue such as the date of birth.

  • You can leave these answers side by side and support each of them with the respective sources; they will now appear side by side in searches.
  • You can evaluate the conflicting statements individually by qualifying each statement with Property:P155 "How sure is this?". A wide range of pre-structured claims are available for this property.
  • You can also ensure that incorrect information will no longer appear in searches. To do this, the markings at the beginning of the input field are lowered or raised accordingly before saving the statement. Statements that have been disqualified as obsolete are retained in the database, where they remain useful for preventing relapses into the deprecated state of information.

The "Directory of Properties" lists all the statements that can currently be made on FactGrid

All statements that can currently be made with the FactGrid are listed in the Directory of Properties both in an overall list and under individual questions on the contents page.

The left column gives the statements according to where you want to make them — for example in a biography or in a database object for a document.

In the right column they are arranged according to the object you want to target — you may want to make a statement on a person (left column) in order to state the organisations (right column) he or she was a member of.

We can define any imaginable statement

Wikibase is extremely open when it comes to database modeling. Any object that can attract statements is able to become a Q-Item on a Wikibase installation. For the database, the statements themselves are Q-Numbers connected by P-numbers. The labelling of these numbers is at the discretion of the user.

The establishment of a new statement is not entirely uncritical:

  • The respective statement might already exist in a wording that did not cross your mind (you can repair this by defining the Alias you would prefer to use)
  • One might actually prefer to remain content with a less precise statement in order to keep search results together — the desired precision can be gained with qualifiers.
  • The data model might not make much sense (For instance an organization could have a "generalissimo" in its hierarchical structure, but a property in this regard is not particularly practical, since it presupposes that seekers already know what kind of positions they have to seek in the organization. In search of the leaders there is a property that asks about the people in a leading position in an organization and the use of qualifiers that show the position of the people in question.

With the FactGrid project, we are interested in giving research projects the greatest possible freedom in the respective issues, so properties have so far been set up without any great hurdles to coordinate. However, in order to avoid technical mistakes and to use the database transparently, we urge a short discussion process on the site

Nevertheless, if you generate properties yourself — under pressure from the project — make sure that you anticipate the type of statements you want to make, i.e. whether they refer to a Q number (database object), a P number ( a property or property) want to refer to a date, a URL or a database ID. Properties that have been set are then fixed in terms of their requirements and can no longer assume any different values. It is advisable to coordinate with people who have already set up properties in FactGrid.

FactGrid is multilingual (as far as you teach it)

Wikibase is designed to be used by a multilingual community. The interface language can be changed accordingly by anyone who has an account. But the option to speak practically any language goes much further: All the Items and all the Properties are eventually only Q- and P-numbers. The different language labels and descriptions turn them into multi-lingual objects that can now occur in as many languages as have contributed the necessary labels and descriptions.

To assign labels and descriptions in a new language, you have to log in in this language and set the language in the "Preferences" section of your user settings.

English is the default language on FatGrid. Labels and descriptions will in any other language appear as stated in English until an alternative label or description has been set on the Item or Property. It is therefore in the interest of all non-registered users that all Q-numbers and all P-numbers are — also — be set in English.

If you feel easy about English you should in a manual input (of items one by one) begin your input in English and add your own language label and description in step two after saving. If you feel less at home in English you will have to rely on other users to give the English translations — your data will just appear as Q-numbers and P-numbers in other language searches until this has been done.

The mass input of data via QuickStatements

If data is already available in a spreadsheet, whether Excel or Google, or in a CSV (comma-separated-value) table, it can also be entered in a serial input. The tool for this, QuickStatements, is linked in the menu on the left.

Data preparation

Compared to entering individual statements the mass input will require quite some preparation: The database can no longer make suggestions during input; you have to find out ahead of time whether the Items you will need are already in the database.

A simple table might state the birth places of several people, among them Johann Sebastian Bach Item:Q147798 (Eisenach Item:Q10341) and his son Johann Christian Bach Item:Q147795 (Leipzig Item:Q10408). This is the input which you have to prepare in this case, Property:P82 is designated to state the respective place of birth:

Q147798 P82 Q10341
Q147795 P82 Q10408

To reach this point you will have to get the Q-numbers for the persons on your list, the Q-numbers for the places referred to. This will require some FactGrid database mining in the first place in oder to get all the persons and places in the database whose Q-numbers must replace the respective names. Things get even more difficult when you have to create the Items in the preparation process. Any negligent work will created masses of duplicates that will cause new work in the merging processes.

The Wikidata-standard tool is OpenRefine. Most FactGrid users are using Excel or Google spreadsheets and the Vertical Lookup feature to retrieve the Q-Numbers from Imports and to make the replacements in their lists.

Calendar dates demand their own preparation. Exact input conventions have to be considered in any mass input.

Finally, the depth of the required input should not be underestimated: To link a statement to a place name is simple in a database that is content with a string input. The Q-item solution is more complex. It is easy to generate Q-items for places but they will not appear on maps unless they have received geographic coordinates; they should be linked to external identifiers in order to gain uniqueness. All this is easily overlooked when calculating the workload of a mass input in Wikibase. The software is bright but the depth it offers comes at its own price.

The exact instructions for the transformation of information in the run-up to the mass input can be found on this help page of the Wikidata project, where it is constantly updated by the bigger community:

https://www.wikidata.org/wiki/Help:QuickStatements

Use prepared batch fragments for the standardised creation of database objects

To create items one by one in regular work you may use Batch fragments, which you will find in the left hand menu.

The good thing about these fragments is that they create items in three languages with at least some uniformity. You can copy the text in the grey area and paste it into QuickStatements, where you can exchange the # and $ signs with the text you want to have on the new item:

qid, Lde, Len, Lfr, Dde, Den, Dfr, P2
, "#", "#", "#", "Familienname", "family name", "nom de famille", Q24499

In this case you will just give the family name three times for the three languages German, English, and French. The Fragments are given as CSV-input (i.e. as comma separate value input). The database will ask you at the beginning for your permission to connect to your account, allow this and send off the command as CSV input. A page will tell you what is about to happen. Press "Start" to send off the batch input.

See "Batch fragments" for some popular fragments. More complex work routines can be tailored to your own project with any number of statements that you plan to make on several objects (like a connection to your research project or to a common topic).

Internet demonstrations on the use of quick statements

There are several videos on YouTube on how to use QuickStatements when entering prepared table information. These make the procedure with their screen demonstrations clearer than cumbersome continuous text descriptions. The Wikidata help also provides extremely detailed explanations on all detailed questions.

https://www.youtube.com/results?search_query=quickstatements+wikidata