Help:Property similarity

From Linux Web Expert

Template:Interlanguagelink: en Semantic MediaWiki's unconstrained schema approach allows users to create or define properties freely and with that freedom it is possible that conceptional identical or near-duplicate properties (similar properties) can occur and be used for value annotations without being detected by an agent that engages in a data curation1 task.

Several methods can help mitigate and counter syntactic similarity issues in the first place such as:

  • Use of templates to formalize user input
  • Use of #REDIRECT to build a pool of synonyms around a canonical property and allow them to be merged2 into a coherent extension of a properties semantics.

Syntactically similar properties should be cleared and removed during the task called semantic gardening if they are indeed not different to each other. See an example for this on the sandbox wiki.3 Semantic MediaWiki 2.5.0 brought the feature of syntactic property similarity evaluation as well as special page "PropertyLabelSimilarity"No description was provided. which assists in displaying syntacticly similar properties and performing the task of semantic gardening.4

Exemption

Configuration parameter $smwgSimilarityLookupExemptionPropertySets the property used to exclude a property from being evaluated during similarity checks defines a property that allows to describe properties in terms of an exemption condition meaning to exclude a property from the process of syntactic similarity evaluation. By default this property is called "owl:differentFrom".

For example, on the property page "Governance level" one may annotate [[owl:differentFrom::Governance level of]] which would result in a suppressed similarity lookup for both properties "Governance level" and "Governance level of" property when compared to each other. Thus it is clear that these two properties "Governance level" and "Governance level of" are indeed similar but conceptually different and they will not be shown on special page "PropertyLabelSimilarity"No description was provided.. See the respective example on the sandbox wiki.5

Syntactic vs. semantic similarity

Syntactic similarity is understood as function that "analyzes the syntactic similarity of a pair of tags" using the "Levenshtein Distance, the Cosine Similarity, the Jaccard Similarity, the Jaro Distance"6:100 while semantic similarity analyzes the "semantic relations defined between tags as well as their frequency"6:101.

Example

See also

Notes

  • bilenko2003adaptive 
  • sagae2009clustering 
  • bollegala2007measuring 

References

  1. ^  |  "...term used to indicate processes and activities related to the organization and integration of data collected from various sources, annotation of the data, and publication and presentation of the data..." from Data curation. (2017, February 13). In Wikipedia, The Free Encyclopedia. Retrieved 21:51, March 25, 2017.
  2. ^ duanuailua2012string 
  3. ^  |  Semantic MediaWiki: GitHub pull request #2244 example #1
  4. ^ gh:smw:2244 
  5. ^  |  Semantic MediaWiki: GitHub pull request #2244 example #2
  6. a b book:sci.352 

#scite could not render a citation text for reference "duanuailua2012string" because type "article" was not assigned to a template.
#scite could not render a citation text for reference "book:sci.352" because type "book" was not assigned to a template.
#scite could not render a citation text for reference "bilenko2003adaptive" because type "inproceedings" was not assigned to a template.
#scite could not render a citation text for reference "sagae2009clustering" because type "inproceedings" was not assigned to a template.
#scite could not render a citation text for reference "bollegala2007measuring" because type "article" was not assigned to a template.