Extension:Cargo/Cargo and Semantic MediaWiki

From Linux Web Expert

Semantic MediaWiki (SMW) is an extension to MediaWiki that lets you store and query data. It has a large number of spinoff extensions -- around 30 active ones -- that make use of it, and together turn an SMW-based system into something resembling a full-fledged, easy-to-use data framework.

The Cargo extension was consciously designed to mimic the full system of SMW and many of its spinoff extensions, in its syntax options and overall interface. In a few cases, code itself has been copied over as well, though in a modified form. In all, Cargo provides some or all of the functionality of eight extensions from the SMW "family": Semantic MediaWiki, Semantic Result Formats, Maps, Semantic Drilldown, Semantic Compound Queries, Semantic Internal Objects, Semantic Scribunto and Semantic Dependency Updater.

Design differences

If Cargo is essentially just a clone of SMW and some other extensions, why was it created in the first place? And why should anyone use it? Cargo does have a number of differences from SMW, that give it some advantages.

Philosophically, Cargo differs from SMW in three main ways:

  • Cargo ties data storage directly to templates. In SMW, semantic values can be placed anywhere on the page, even though in practice they're usually confined to templates; but in Cargo, it is the template itself that is responsible for storing its data.
  • Cargo stores its data in as simple a fashion as possible, using standard database tables to hold tabular data; while SMW uses a database to represent "triples" of data.
  • Though this is a more minor difference, Cargo is less customizable than SMW and its spinoff extensions, opting instead to base display settings on the data itself.

The first two differences especially enable the code in Cargo built around both storage and querying to be much simpler than that of SMW. Cargo lets users make near-direct use of SQL "SELECT" statements; which means that a custom query language does not need to be defined or supported. It also means that Cargo's own code for displaying query results in various formats can be significantly simpler than the corresponding code in SMW, SRF etc. And it means that the setup and maintenance work for administrators can be simpler. Cargo, a single extension, can take the place of about 15 extensions: the eight extensions listed before, plus another seven or so "library" extensions required by Semantic MediaWiki, like DataValues .

Features checklist

The table below shows the main set of functionality that SMW-based sites tend to make use of, and how it is, or is not, available in a Cargo-based system.

Feature SMW-based system Cargo-based system Notes
Data storage Semantic MediaWiki Cargo Functionality mostly matched except for saving to an RDF triplestore.
Querying and basic display Semantic MediaWiki Cargo The #cargo_query function is generally more powerful than SMW's equivalent #ask, due to its closeness to SQL. One feature that SMW has and Cargo lacks is the ability to query on a category and get back all the pages within subcategories of that category. However, in Cargo you can instead define a manual hierarchy structure, then query to get all pages matching a value or one of its sub-values.
Data browsing interfaces Semantic MediaWiki, Semantic Drilldown Cargo Within the SMW system are special pages for doing free-form querying (Special:Ask and Special:SearchByProperty), viewing all data for a single page (Special:Browse), viewing all values for a single property (property pages) and filtering on the data (Semantic Drilldown's Special:BrowseData). In Cargo, you can do free-form querying at Special:CargoQuery, view the data for a single page at "?action=pagevalues", view all values for a table (not just one field) at Special:CargoTables, and filter on the data at Special:Drilldown.
More complex visualizations Semantic Result Formats, Maps Cargo Cargo offers most of the display formats defined in SRF and other extensions, including maps, calendars, timelines and bar charts. In some cases, like SRF's mathematical formats, this functionality is duplicated not by result formats but rather can be done directly through Cargo's SQL-based querying.
Maps for individual points Maps Cargo Done via the #cargo_display_map function.
Form-based page editing Page Forms Page Forms PF forms can make use of Cargo metadata (for input types and the like) and data (for autocompletion), in the same way that they would for SMW.
Storage of n-ary data SMW (#subobject), Semantic Internal Objects Cargo The Cargo system inherently allows storage and querying of any "dimension" of arrays.
Compound queries Semantic Compound Queries Cargo Done via the #cargo_compound_query function.
Helper forms to create data structure pages Page Forms , Page Schemas Page Forms, Page Schemas Within the SMW system, the four components of a data structure are categories, forms, templates and properties, and the Page Forms and Page Schemas extensions provide functionality to automatically create all four. With Cargo, there are no properties, but Page Forms and Page Schemas can be used to create categories, forms and templates.
Storage and display of outside data External Data + SMW External Data + Cargo External Data can store its data via Cargo, simply by defining Cargo storage within some template and then calling External Data's #display_external_table function with that template.
Notification of data changes Semantic Watchlist EditNotify EditNotify lacks the user interface of Semantic Watchlist, but it can be used to notify users when there are changes to specific template fields, which is the equivalent of what Semantic Watchlist does.
Storage of page metadata Semantic MediaWiki, Semantic Extra Special Properties Cargo SMW allows for storing and querying metadata via special properties. In Cargo this can be done using the _pageData and _fileData tables. Data that can be stored with both extensions includes the date any page was created, the date it was last modified, the username of the user who created the page, and the number of edits the page has had.
Calling queries from within Scribunto modules Semantic Scribunto Cargo Scribunto is a MediaWiki extension that lets you embed scripts, written in the Lua language, in wiki pages.
Tooltips Semantic MediaWiki (#info) RegularTooltips The RegularTooltips extension, with its #info-tooltip function, provides equivalent functionality to SMW's #info; #info-tooltip is actually somewhat superior because it works correctly in multiple-instance templates within forms, whereas #info does not.
Page cache refresh/purge tab for administrators Semantic MediaWiki Cargo Both SMW and Cargo supply such a tab; in SMW it is called "Refresh", while in Cargo it is called "Purge cache". The tab supplied by Cargo is somewhat superior in that it doesn't require the admin to also hit "OK" before the cache is purged.
Automatic cache refresh of pages with queries whose data has changed SemanticDependencyUpdater Cargo This is done in Cargo via the cargo_backlinks database table.

Advantages of Cargo

The previous section covered the ways in which Cargo does and does not measure up to Semantic MediaWiki's abilities. But there are some things that Cargo can do better, or which SMW currently cannot do at all. These are listed below.

More powerful querying

The usage of near-direct SQL enables Cargo to do queries that are not easily possible in SMW. These include:

  • Finding blank values. You can get the set of pages that do not have a value for some field; this is not possible with SMW.
  • Sorting with blank values. With SMW, if you sort on a particular property, pages that have a blank value for that property will not be displayed in the results. With Cargo, blank values are handled in the same way as all other values.
  • String operations. With Cargo, you can do string operations and comparisons within queries, like finding all rows that have a value for some field with exactly five characters.
  • Complex logical combinations of AND, OR and NOT.

Easier data structure setup

  • No properties. Cargo does not use properties and property pages; thus, its data structure is quite a bit more minimal, since properties can easily make up 95% or more of the pages in an SMW data structure.
  • No subobjects. As noted above, with SMW you need to use either subobjects or "internal objects" (essentially the same thing) to store an array of data within a page. In Cargo, all data is stored the same way, eliminating this complication.
  • Automatic display of all data. From the page Special:CargoTables, users can click through and see table display of all the Cargo data. In SMW, queries would have to be created manually to show all this data.
  • Automatic drilldown filters. In Cargo, filters for drilldown are set automatically, based on the fields in each table and their types. In SMW (really Semantic Drilldown), each such filter has to be defined manually.

Faster performance

Cargo uses a simple database structure, instead of Semantic MediaWiki's more complex, custom DB structure (assuming a triplestore is not used); so it's not surprising that Cargo's querying would be at least somewhat faster than SMW's. Some testing has shown Cargo's querying to be around 30-50% faster than SMW's, for equivalent queries. You can see more details at the page Performance testing.

Other

  • Full text searching. If you are using MySQL, you can do a standard text search on the text of pages, on the text of files (PDF only), and potentially on other fields as well, within queries. This is not possible with Semantic MediaWiki.
  • Full text search within drilldown. Similarly, in Cargo's Special:Drilldown page, there is a search input for searching on the contents of pages and uploaded PDF files. This is not possible with Semantic Drilldown.
  • Easier querying by outside systems. Both SMW and Cargo provide an API for querying their contents by external systems. But with Cargo, you can also have such systems query the database tables directly, if it has the proper permissions. (This is also doable in SMW, but extracting data from its tables through direct SQL queries is difficult.)

See also