Inferencing

From Linux Web Expert

Revision as of 16:28, 13 January 2020 by >Nicolas NALLET (more Inferencing features that are not supported */)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Template:Interlanguagelink: en

Semantic search can be used to find pages based on the information that users have entered about them into the wiki. This simplifies many tasks, but it still requires that semantic information is entered manually in the first place. In some cases, one would expect the wiki to be «smarter» and to deduce some facts even if they were not entered directly. In some cases, SMW can draw such inferences automatically, as described in this article.

Subcategories

MediaWiki supports a hierarchy of categories that helps to simplify the category statements needed on each page. As an example, assume that a wiki uses categories «Person», «Woman», and «Man». It is obvious to a human that every page that belongs to the category «Woman» should also belong to the category «Person». But «Woman» is clearly more specific, and many wikis (including Wikipedia) have a policy to use only the most specific categories on some page — otherwise the page would often have to contain dozens of categories that are hard to maintain. To indicate that one category is more specific than another, MediaWiki supports a category hierarchy where users can state that one category is a subcategory of another, simply by putting a category on the subcategory's page, e.g. the page Category:Woman could contain the text

[[Category:Person]]

For details, see the MediaWiki handbook.

By default, SMW uses this subcategory information in semantic queries: when asking for all pages that belong to a category, it will also find all pages that belong to any subcategory of that category. In the above example, the query [[Category:Person]] would also return the pages in categories «Man» and «Woman». In other words, the actual query corresponds to

[[Category:Person]] OR [[Category:Woman]] OR [[Category:Man]]

If the category hierarchy is deeper, then SMW will also include further subcategories. For example, one may have a category «Mother» of all women that have children, and this again would be a subcategory of «Woman». Then the above query would retrieve all pages in category «Mother» as well.

SMW's mechanism of subcategory inferencing can be restricted or disabled by the site administrator. Normally, it supports only a certain maximal depth of category hierarchies, so it may not return all results if there are very long chains of subcategories involved. Using a manually created query with OR as above is a work-around in this case, but it does of course not take into account any changes in the category hierarchy.

In some cases, wikis use categories and category hierarchies that are not suitable for being treated in the above way. For example, Wikipedia uses a category called «Cities» not for collecting all cities but for all articles that are related in some ways to cities. Even the category «Cities in Canada» is used to collect all pages that have some relationship with that topic. This is not an actual problem of categories or category hierarchies: the semantic query [[Category:Cities]] still returns all pages related to that topic, it just does not return actual cities only. So one might argue that the name of the category is confusing in some sense, but this is merely a matter of how to organise a wiki. If a wiki has no category for actual cities as such, then no semantic query can produce all cities directly.

A more serious problem in large wikis might be what is called «semantic drift». This occurs if the exact intention of some category is not really specified, e.g. because it lacks a detailed description on its page. Different users then may have slightly different readings of the categories meaning, and this may influence how they use sub-category statements. For example, some editors may reasonably say that «Priest» is a subcategory of «Religious office» (referring to the job category), while others may deem «Female priest» to be a subcategory of «Priest» (referring to the class of people having that job) – but this would imply that all pages in «Female priest» are also implicitly categorised in «Religious office», thus confusing people and job occupations. It is therefore important to always clearly describe on a category page what should go into a category and what shouldn't, and also to point to alternative categories that may be suitable.

Since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki it is possible to limit the resolving of hierarchies for individual queries using the query condition |+depth=.

Subproperties

Just like categories, also properties can be more specific than one another. For example, a wiki may have the property «capital of» to relate cities to countries, and a property «located in» that generally describes that some city is located in some country. Now it happens to be the case that every capital city also must be located in the country that it is capital of. In other words, «capital of» is a subproperty of «located in». Whenever a user states that a page is a capital of some country, SMW should then also conclude that the page has an (implicit) «located in» relation to that country as well. To say that in the wiki, the following can be entered on the page Property:Capital of:

[[subproperty of::Property:located in]]

Once this has been stated, a query [[located in::Germany]] will also return the capital Berlin even if no «located in» property is given on that page. Similar considerations as in the case of categories apply, and detailed descriptions on property pages are a good method for avoiding semantic drift.

Since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki it is possible to limit the resolving of hierarchies for individual queries using the query condition |+depth=.

Equality of pages: redirects

It often happens that a thing can be referred to by different names, such as in the case of "Semantic MediaWiki", which is synonymous with "Semantic MediaWiki". In MediaWiki, this is solved by redirects that forward readers from one page to another. But synonyms may be even more important in a semantic wiki, where one wishes to organise content and make it more accessible. If different editors use different page names in annotations, then it is hard to create queries which will still display a unified view on the wiki content.

SMW therefore treats all redirects between pages as synonyms. This means that it does not matter at all whether a redirect page or the actual target page is used in a query or annotation. SMW internally uses only redirect targets to work with, and all functions will take the redirect structure into account. This mechanism works only for immediate redirects: redirects that point to other redirect pages are not supported and should be resolved (this is also the case in MediaWiki anyway).

Since SMW 1.2, it is also possible to use redirect on properties and categories with the same effect, so multiple synonyms for properties can be created. It is not suggested to use that feature for the case of categories, though, simply because MediaWiki's category functions will still ignore category redirects such that some wiki features will not work as expected. Redirects between two different namespaces, such as redirects from normal pages to properties, properties to categories, etc. are not supported in a special semantic way. They still create normal MediaWiki redirect pages but nothing else.

Inferencing and printout statements

Printout statements do generally not perform any inferencing, i.e. they will only return the statements that are explicitly made for some page. This is desired in some situations, and it may be a limitation in others. A work-around can be to use a template for annotation, and to give two property values explicitly in that template, essentially by writing something like

[[capital of::located in::Germany]]

which is the same as writing [[capital of::Germany]] and [[located in::Germany]], but it will show only one link to Germany.

Inferencing features that are not supported

It sometimes happens that ambitious contributors in a wiki will create properties that also suggest a specific meaning for automated deduction. It should therefore be noted that SMW does not support any of the following features:

  • Transitivity
  • Domain and range restrictions
  • Number restrictions and functional properties
  • Symetric/Asymetric
  • Chain Axiom

Even if properties that sound like the above are introduced, and even if these are linked to well-known properties in ontology languages such as OWL, RDFS, SKOS, etc., SMW will not use these annotations to perform smarter queries. To prevent confusion, it is suggested to not use names that resemble established notions in existing ontology languages, or at least to clearly document this limitation on the property pages.

To some extent, one may be able to craft queries to achieve a similar effect. The sample pages Germany and California show examples of queries for inverse relationships; the sample page Germany shows an example of a subquery that approximates a transitive relationship to some extent.


Property "Namespace" has a restricted application area and cannot be used as annotation property by a user.Property "Docinfo editor" (as page type) with input value "User:>Nicolas NALLET" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

This documentation page applies to all SMW versions from 1.5.0 to the most current version.
Other versions: 1.2 – 1.4.3       Other languages: DeFrRuZh-hans