Virtuoso Sponger

What Is The Sponger?

The Virtuoso Sponger is the Linked Data middleware component of Virtuoso. It generates Linked Data from a variety of data sources, and supports a wide variety of data representation and serialization formats. The Sponger is transparently integrated into Virtuoso's SPARQL Query Processor where it delivers URI de-referencing within SPARQL query patterns, across disparate data spaces. It also delivers configurable smart HTTP caching services. Optionally, it can be used by the Virtuoso Content Crawler to periodically populate and replenish data within the native RDF Quad Store.

The Sponger is also a full-fledged HTTP proxy service, directly accessible via SOAP or REST interfaces.

As depicted below, OpenLink?'s broad portfolio of Linked-Data-aware products supports a number of routes for creating or consuming Linked Data. The Sponger provides a key platform for developers to generate quality data meshes from unstructured or semi-structured data sources.

Why is it Important?

A majority of the worlds data naturally resides in non-Linked-Data form at the current time. The Sponger delivers middleware that accelerates the bootstrap of the Semantic Data Web by unobtrusively generating Linked Data (typically in RDF form, today) from non-Linked-Data data sources. This "Swiss army knife" for on-the-fly Linked Data generation provides a bridge between the traditional Document Web and the Linked Data Web ("Data Web").

Sponging non-Linked-Data Web sources and converting their data content to Linked Data exposes that data in a canonical form for querying and inference, and enables fast and easy construction of Linked-Data-driven "mesh-ups" (as opposed to code-driven Web 2.0 mash-ups).

Linked Data extraction and instance data generation products that offer functionality similar to that demonstrated by the Sponger are also commonly referred to as "RDFizers."

How Does It Work?

Designed with a pluggable architecture, the Sponger's core functionality is provided by Cartridges?. Each cartridge includes Data Extractors? which extract data from one or more data sources, and Ontology Mappers? which map the extracted data to one or more ontologies/schemas, en route to producing RDF Linked Data.

Cartridges are highly customizable, and can be developed using any language supported by the Virtuoso Server Extensions API. This enables generation of structured linked data from virtually any resource type, rather than limiting users to resource types supported by the default Sponger Cartridge collection bundled as part of the Virtuoso Sponger VAD package (cartridges_dav.vad).

(See an animation of the concept, if the embed above fails in your browser.)

The Sponger also includes a pluggable name resolution mechanism that enables Custom Resolvers for naming schemes (e.g., URNs) associated with protocols beyond HTTP. Examples of custom resolvers include:

URN handler Sample URI Resource Description Linked Data View Linked Data Graph Needs
DOI doi:10.1038/35057062 HTML Representation Linked Data View Data Explorer View hslookup plugin; and enabling of relevant mappers for html, pdf, xml, etc.
LSID urn:lsid:ubio.org:namebank:12292 HTML Representation Linked Data View Data Explorer View None
OAI oai:dcmi.ischool.washington.edu:article/8 HTML Representation Linked Data View Data Explorer View None

Cache expiration is managed through the MinExpiration parameter in the virtuoso.ini file.

Installation Steps

  1. A default Virtuoso installation includes the cartridges VAD package, which includes all publicly-available Sponger cartridges and associated components. Check to ensure it is installed using the System Admin -> Packages tab of the Virtuoso Conductor.
    • If listed as uninstalled, click the install button to the right of the package.
    • If the cartridges VAD is not listed, it can be downloaded now. Install the cartridges_dav.vad package using the Conductor UI from the System Admin -> Packages tab or by using iSQL:

      SQL> DB.DBA.VAD_INSTALL('tmp/cartridges_dav.vad',0); SQL_STATE SQL_MESSAGE VARCHAR VARCHAR _______________________________________________________________________________ 00000 No errors detected 00000 Installation of "Linked Data Cartridges" is complete. 00000 Now making a final checkpoint. 00000 Final checkpoint is made. 00000 SUCCESS 6 Rows. -- 1078 msec.

  2. To enable data insertion into the Quad Store via SPARQL queries, you need to assign SPARQL_SPONGE privileges to user SPARQL. (Note: more sophisticated security is provided via WebID based ACL protection of your SPARQL endpoint).
  3. Configuring Sponger Cartridges

Sponger Cartridges included in a Standard Virtuoso Installation

There are a few kinds of Cartridge, and many of each kind are included in a standard Virtuoso installation. Click here? for a breakdown of OpenLink?-supported Data Sources.

Extractor Cartridges

An Extractor Cartridge processes a Resource of a given format, extracting RDF according to rules appropriate to that format. External data does not come into play; only the content of the Resource fed to the Sponger.

Supported Standard Non-RDF Data Formats

These Cartridges handle open formats -- typically community-developed, openly-documented, and freely-licensed data structures.

Supported Vendor-specific Non-RDF Data Formats

These Cartridges handle closed formats -- typically proprietary; sometimes undocumented; possibly licensed to no-one except the format originator. Sometimes data may not be parsed as desired or expected, as many of these Cartridges have required reverse-engineering of the data format in question.

Meta Cartridges

A Meta Cartridge submits a Resource to a third-party Web Service for processing. Returned RDF supplements the RDF generated by Extractor and other Meta Cartridges. Locally generated RDF may also be submitted to the third-party services, instead-of or in-addition-to the original Resource itself.

Default Sponger behavior is for all installed Meta Cartridges to be brought to bear on all submitted Resources.

Sponger Cartridge-based, Dynamic Linked Data Cloud

Click the image for a full-size, clickable version!

Sponger Cloud PNG?

Sponger pragmas

Virtuoso's Sponger is a sophisticated piece of middleware that provides full Linked Data fidelity for pre-existing data objects or resources. This Linked Data is then accessible via HTTP-based Web Services, and SPARQL is enhanced with Sponger pragmas and some optional additions to the FROM clause. See full list of supported pragmas and usage examples?.

Sponger Cartridge Configuration

Sponger Usage Examples

Other Related Pages

CategoryEvangelism CategoryDocumentation CategoryPR CategoryVirtuoso CategoryRDF CategorySPARQL