This HTML5 document contains 26 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
dctermshttp://purl.org/dc/terms/
n16http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VirtRDFBulkLoaderWithDelete/
atomhttp://atomowl.org/ontologies/atomrdf#
foafhttp://xmlns.com/foaf/0.1/
n8http://vos.openlinksw.com/dataspace/services/wiki/
oplhttp://www.openlinksw.com/schema/attribution#
n2http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/
dchttp://purl.org/dc/elements/1.1/
n18http://docs.openlinksw.com/virtuoso/databaseadmsrv.html#
n19http://docs.openlinksw.com/virtuoso/
n13http://vos.openlinksw.com/dataspace/dav#
rdfshttp://www.w3.org/2000/01/rdf-schema#
n7http://rdfs.org/sioc/services#
n6http://vos.openlinksw.com/dataspace/person/dav#
siocthttp://rdfs.org/sioc/types#
n10http://vos.openlinksw.com/dataspace/owiki/wiki/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n21http://vos.openlinksw.com/dataspace/owiki#
xsdhhttp://www.w3.org/2001/XMLSchema#
n12http://vos.openlinksw.com/dataspace/%28NULL%29/wiki/VOS/
n14http://vos.openlinksw.com/dataspace/person/owiki#
siochttp://rdfs.org/sioc/ns#

Statements

Subject Item
n2:VirtRDFBulkLoaderWithDelete
rdf:type
sioct:Comment atom:Entry
dcterms:created
2017-06-13T05:44:17.857230
dcterms:modified
2017-06-13T05:44:17.857230
rdfs:label
VirtRDFBulkLoaderWithDelete
foaf:maker
n6:this n14:this
dc:title
VirtRDFBulkLoaderWithDelete
opl:isDescribedUsing
n16:sioc.rdf
sioc:has_creator
n13:this n21:this
sioc:content
%META:TOPICPARENT{name="VirtBulkRDFLoader"}% ---+ Delta-aware bulk loading of datasets into Virtuoso %TOC% ---++ Why High performance bulk-revision of existing data, on a par with simple bulk insertion of similar data, is best achieved by finding the difference (the "delta") between an existing graph or dataset and the new graph or dataset being loaded, and then applying that differential or "graph delta" to the quad store. ---++ What Given an existing dataset hosted by Virtuoso, identified by a named graph IRI, and one that's being loaded from N-Quad files in the filesystem, Virtuoso's bulk load process can automatically determine the differences between the two datasets and quickly apply relevant <code>INSERTs</code>, <code>UPDATEs</code>, and <code>DELETEs</code> to the existing dataset. The Virtuoso RDF Bulk Loader is told to use this "graph delta" load process with a special option called <b><code><nowiki>with_delete</nowiki></code></b>, applied in the <code>[[http://docs.openlinksw.com/virtuoso/fn_ld_dir.html][ld_dir()]]</code> or <code>[[http://docs.openlinksw.com/virtuoso/fn_ld_dir_all.html][ld_dir_all()]]</code> commands. ---++ How ---+++Prerequisites * A Virtuoso Commercial Edition Release 06.04.3134 or greater is required.%BR%%BR% * The <code>with_delete</code> option is available in * Release 6.x, only in cluster mode * Release 7.x, in both cluster and single-server mode%BR%%BR% * N-Quad datasets where every graph name is specified within the dataset. Graphs need not be in any particular order, but all triples from each graph must be together. Triples from different graphs cannot be intermingled. (In SQL terms, <code>GROUP BY</code> graphname; no <code>ORDER BY</code> is necessary.) %BR%%BR% * Virtuoso must be allocated at least 200 bytes of RAM per quad in the dataset being loaded. As may be obvious, loading large graphs with this option can have a significant impact on Virtuoso's memory use.%BR%%BR% * The Virtuoso server must be running with a [[http://docs.openlinksw.com/virtuoso/databaseadmsrv.html#configsrvstupfiles][default transaction isolation level]] of 2, <code>READ COMMITTED</code>. Ensure that the <code>[Parameters]</code> section of the Virtuoso configuration file (default, <code>virtuoso.ini</code>) includes the following entry, and restart the Virtuoso server. <verbatim> DefaultIsolation = 2 </verbatim> * The following lock mode settings should be set before using the <code><nowiki>with_delete</nowiki></code> option: <verbatim> cl_exec ('__dbf_set (''lock_escalation_pct'', 200)'); cl_exec ('__dbf_set (''enable_distinct_key_dup_no_lock'', 1)'); </verbatim> * The dataset files must not contain multiple graphs which have the same name but contain different triples. Doing so will result in unpredictable triple counts, depending on which dataset file is being loaded on a given thread, which is non-deterministic.%BR%%BR% ---+++ Basic usage Using the <code>[[http://docs.openlinksw.com/virtuoso/fn_ld_dir.html][ld_dir()]]</code> or <code>[[http://docs.openlinksw.com/virtuoso/fn_ld_dir_all.html][ld_dir_all()]]</code> commands as usual, set the <code><nowiki>target_graph</nowiki></code> argument to <code><nowiki>'with_delete'</nowiki></code> for each dataset file specified in <code><nowiki>ll_file</nowiki></code> that is known to require an update/reload. For example -- <verbatim> ld_dir ('/data8/2848260', '%.gz', 'with_delete'); ld_dir_all ('/data8/', '%.gz', 'with_delete'); </verbatim> Once all are set run the <code><nowiki>rdf_loader_run()</nowiki></code> or <code><nowiki>cl_exec('rdf_ld_srv()')</nowiki></code> commands to enable the update/reload to commence. As many <code><nowiki>rdf_loader_run()</nowiki></code> or <code><nowiki>cl_exec('rdf_ld_srv()')</nowiki></code> commands can be invoked as threads/cores are available across the machines the Virtuoso cluster is being run on for fast parallel loading of the datasets, as would typically be done for the initial bulk load of the datasets. Note that all RDF loader threads can be stopped using the following command at which point all currently running threads will be allowed to complete and then exit: <verbatim> rdf_load_stop() </verbatim> ---+++ Diagnostics A diagnostic log of the <code><nowiki>with_delete</nowiki></code> activity may be written to a file called <code><nowiki>g_log.txt</nowiki></code> on each cluster instance. * To enable this log, run the following command: <verbatim> cl_exec ('__dbf_set (''enable_g_replace_log'',1)') </verbatim> * To disable this log, run the following command: <verbatim> cl_exec ('__dbf_set (''enable_g_replace_log'',0)') </verbatim> ---++ Related * [[VirtBulkRDFLoader][Virtuoso RDF Bulk Loader]]
sioc:id
32f1154a6714b2b494bb0c3cd03594b7
sioc:link
n2:VirtRDFBulkLoaderWithDelete
sioc:has_container
n10:VOS
n7:has_services
n8:item
atom:title
VirtRDFBulkLoaderWithDelete
sioc:links_to
n12:VirtBulkRDFLoader n18:configsrvstupfiles n19:fn_ld_dir.html n19:fn_ld_dir_all.html
atom:source
n10:VOS
atom:author
n6:this
atom:published
2017-06-13T05:44:17Z
atom:updated
2017-06-13T05:44:17Z
sioc:topic
n10:VOS