<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://xmlns.com/foaf/0.1/maker>	<http://vos.openlinksw.com/dataspace/person/owiki#this> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://atomowl.org/ontologies/atomrdf#title>	"VOSArticleWebScaleRDF" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://sindice.com/> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.openlinksw.com/weblog/oerling/?id=1284> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.cs.vu.nl/~pmika/swc/btc.html> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/OpenLink> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#has_creator>	<http://vos.openlinksw.com/dataspace/owiki#this> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#topic>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.w3.org/2007/03/RdfRDB/papers/erling.html> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://CEUR-WS.org/Vol301/Paper%205%20Erling.pdf> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://vos.openlinksw.com/dataspace/%28NULL%29/wiki/VOS/DeCandia> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://atomowl.org/ontologies/atomrdf#updated>	"2017-06-13T05:50:14Z" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://rdfs.org/sioc/types#Comment> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://labs.google.com/papers/bigtable-osdi06.pdf> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.scs.stanford.edu/08sp-cs144/sched/readings/amazon-dynamo-sosp2007.pdf> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://atomowl.org/ontologies/atomrdf#published>	"2017-06-13T05:50:14Z" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://www.w3.org/2000/01/rdf-schema#label>	"VOSArticleWebScaleRDF" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.oracle.com/technology/products/database/clustering/index.html> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#has_container>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://purl.org/dc/terms/modified>	"2017-06-13T05:50:14.459628"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://purl.org/dc/elements/1.1/title>	"VOSArticleWebScaleRDF" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#has_creator>	<http://vos.openlinksw.com/dataspace/dav#this> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://purl.org/dc/terms/created>	"2017-06-13T05:50:14.459628"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#content>	"%VOSWARNING%\n\n\n---+Towards Web-Scale RDF\n\nOrri Erling (Program Manager, OpenLink Virtuoso) and Ivan Mikhailov (Lead Developer, OpenLink Virtuoso) \noerling{at}openlinksw.com and imikhailov{at}openlinksw.com\n\nOpenLink Software, 10 Burlington Mall Road Suite 265 Burlington, MA 01803 U.S.A.%BR%\nhttp://www.openlinksw.com/\n\n%TOC%\n\n---++Abstract\n\nWe are witnessing the first stages of the document Web becoming a data Web, \nwith the implied new opportunities for discovering, re-purposing, \"meshing \nup,\" and analyzing linked data. There is an increasing volume of linked \nopen data, and the first data Web search engines are taking shape. Dealing \nwith queries against the nascent data Web may easily add two orders of \nmagnitude in computing power requirements on top of what a text search \nengine faces. Queries may involve arbitrary joining, aggregation, \nfiltering, and so forth, compounded by the need for inference and on-the-\nfly schema mapping.\n\nThis is the environment for which Virtuoso Cluster Edition is intended. \nThis paper presents the main challenges encountered and solutions arrived \nat during the development of this software.\n\nWe present adaptations of RDF load- and query-execution and query-planning \nsuited for distributed memory platforms, with special emphasis on dealing \nwith message latency and the special operations required by RDF.\n\n---++Introduction\n\nVirtuoso is a general-purpose RDBMS with extensive RDF adaptations.\n\nIn Virtuoso, RDF data may be stored as RDF quads (i.e., graph, subject, \npredicate, object tuples). All such quads are in one table, which may have \ndifferent indexing depending on the expected query load.\n\nRDF data may also be generated-on-demand by SPARQL queries against a \nvirtual graph mapped from relational data, which may reside in Virtuoso \ntables or tables managed by any third party RDBMS. The \"relational-to-RDF \nmapping\" capability is further described in [[http://www.w3.org/2007/03/RdfRDB/papers/erling.html][Declaring Linked Data Views of SQL Data]] [1]; \nthis paper limits itself to discussing physically-stored RDF quads.\n\nWe recognize two main use cases of RDF data, which we may call the \"Data \nWarehouse\" and the \"Open Web\" scenarios. The Data Warehouse is built to \nserve a specific application, and can be laid out as a collection of \nrelatively few graphs with well defined schemes. Since the application is \nknown, domain experts can specify what inference is relevant, and the \nresults of such inference can often be forward chained. Since data are \nloaded through custom ETL procedures, the identities of entities can often \nbe homogenized at load time, so that the same URI ends up standing for the \nsame thing even when the identifiers in the original data may differ.\n\nThe Open Web scenario is found when crawling data from the Web for search \nor Web analytics, or linked data mesh-ups. Data are often automatically \ndiscovered, provenance becomes important, and it is no longer possible to \nexhaustively list all graphs that may participate in a query's evaluation. \nForward chaining inferred data becomes problematic due to large volumes, \nheterogeneous schemes, relative abundance of <code>owl:sameAs</code> links, and so forth. \nAlso, Web-scale data volumes will typically require redundant infrastructure \nfor uptime due to expected equipment and network failures.\n\nVirtuoso Cluster Edition is intended to be configurable for both use cases.\n\n---++Database Engine\n\nThe Virtuoso DBMS and its main RDF-oriented features are described in \n[[http://CEUR-WS.org/Vol301/Paper%205%20Erling.pdf][RDF Support in the Virtuoso DBMS]] [2].\n\nVirtuoso has a general-purpose relational database engine enhanced with \nRDF-oriented data types such as IRIs and language- and type-tagged strings. \nVirtuoso makes extensive use of [[http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing][bitmap indices to improve space efficiency]] [3]. \nThe default index layout uses <code>GSPO</code> (<code>graph, subject, predicate, object</code>) \nas the primary key, and <code>OPGS</code> as a bitmap index. These two indices are usually \nadequate for dealing with queries where the graph is known.\n\nFor cases where the graph is left open, the recommended index layout is <code>SPOG</code> \nas primary key, with <code>OPGS</code>, <code>GPOS</code>, and <code>POGS</code> as bitmap indices. The bitmap index \nmeans that in the case of <code>OPGS</code>, for example, for each distinct <code>OPG</code> (<code>object, \npredicate, graph</code>), there is a bitmap with 1 bit corresponding to each subject \nwhich has object <code>O</code> as a value of property <code>P</code> in graph <code>G</code>.\n\nWith typical RDF data, such as DBpedia [4] version 3, the bitmap index takes \nabout 60% of the corresponding non-bitmap index space.\n\nBoth regular and bitmap indices use key compression, which collapses 64-bit \nIDs into 16 bits when the ID is within an integer increment of 16 from a \nprevious ID on the same page. Common prefixes for strings are also eliminated. \nAn index compressed in this manner, using 64-bit IDs takes 56% of the space \nof a non-compressed index with the same content but with 32-bit IDs.\n\nAfter key compression is applied, using gzip gives a further almost 50% gain, \ni.e., 95% of all 8K pages drop to under 4K. Many pages compress to less than \nthis but the percentage of pages that do not fit in the target compressed size \nmust be kept small to maintain locality. The cost of compression is low &#8212; \nabout 600 microseconds for compressing a page and a quarter of this for \nuncompressing. Pages in cache must be kept uncompressed since a random access \nof one triple out of hundreds of millions is only around 4-5 microseconds for \ndata in memory; thus, applying gunzip at each of the usually 4 index tree \nlevels would increase the time to about 600 microseconds. Stream compression \nis not good for database disk cache but does make for smaller files, easier \nbackup, and better utilization of the hardware/OS disk cache.\n\n---++Query Planning\n\nOptimizing SPARQL queries against a quad store is not fundamentally different \nfrom optimizing SQL against a general purpose RDBMS. Still, regular SQL \noptimization statistics do not provide the requisite level of detail for RDF \nuse cases. For a cost model to work well with RDF, it must be able to guess \na match count for quads where any combination of <code>GSPO</code> is either equal to a \nconstant, equal to a value known only at query time, or left open.\n\nPre-calculated histogram-style statistics do not answer these questions very \nwell. This is why Virtuoso takes the approach of sampling the database at \nquery optimization time. For example, when <code>G</code> and <code>S</code> are given, it is efficient \nto get a ballpark count of the matching <code>P, O</code> tuples by simply looking up the \nfirst match and counting matches on the same page. If the page ends with a \nmatch, also count the pages referenced from the parent of this page if they \nbegin with a match, but do not read these; just assume their average row \nlength to be the same as that of the first leaf page. For low cardinality \ncases, this often gives exact counts since all the matches are on a page; \nfor high cardinality cases, taking a single page sample can often hit within \n70% of the count when the count is in the millions.\n\nOnce the cardinalities are known, costing the queries is no different from SQL \nand is handled by the same code.\n\nOne difference from SQL is that hash joins are almost never preferred for RDF \ndata. The reasons are that there pretty much always is an index that can be \nused, and that a full table scan is almost unknown.\n\n---++On Shared Memory, MP, and Latency\n\nA multi-core processor running a database will not deliver linear scale even \nwhen there is one query per core and the queries do not have lock contention. \nAs a rule of thumb, a 4-core Xeon runs 4 query streams in 1.3 times the time \nit takes to run one of the streams, supposing the queries do not hit exactly \nthe same data in the same order, the data is already in memory, and there is \nno wait for locks. This is roughly true of Virtuoso and other databases.\n\nThis is a best case. The worst case can easily destroy any benefit of SMP. If \na thread has to wait for a mutex, the cost of the wait can be several \nmicroseconds even if the mutex is released 100 ns after the wait starts. If \nthere is a pool of worker threads to serve a job queue, any time the queue \ngoes empty will also cost about this much. We remember that a single triple \nlookup is about 4 ?s; thus, spinning a thread to do a background single triple \noperation makes no sense. At least a dozen operations have to be dispatched \ntogether to absorb the cost of waiting for a thread to start and eventually \nblocking to wait for its completion. One must never think that multithreading \nis an end in itself.\n\n---++On Networks and Latency\n\nA round trip of sending one byte back and forth between processes on the same \nCPU takes as much as 20 ?s real time over Unix domain sockets. Adding thread \nscheduling to this, as would be found in any real server process, makes the \nround trip 50 ?s. The same takes about 140 ?s on a 1Gbit Ethernet with no \nother traffic. We have a test program which runs n threads on each node of \na cluster. Each of these threads sends a ping carrying x bytes of payload to \nevery other node of the cluster and waits for the reply from all before \nsending the next ping. This creates a full duplex traffic pattern between \nall pairs of cluster nodes with intermittent sync.\n\n---++++4 processes on 4 core SMP box\n\n| *Message length* | *Aggregate round trips/s* | *Aggregate MB/s* |\n|   <code>1,000 </code>|   <code>37,000 </code>|   <code>74 </code>|\n|   <code>10,000 </code>|   <code>17,200 </code>|   <code>329 </code>|\n|   <code>100,000 </code>|   <code>2,380 </code>|   <code>455 </code>|\n\n---++++4 processes on 4 extra large AMIs on Amazon EC2\n\n| *Message length* | *Aggregate round trips/s* | *Aggregate MB/s* |\n|   <code>1,000 </code>|   <code>10,000 </code>|   <code>20 </code>|\n|   <code>10,000 </code>|   <code>3500 </code>|   <code>67 </code>|\n|   <code>100,000 </code>|   <code>950 </code>|   <code>181 </code>|\n\n\nThe round trips count is the number of messages sent by any node, divided by \n2, multiplied by the duration in seconds. The MB/s is the sum total of data \nsent by all nodes during the interval, divided by the length of the interval.\n\nComparing these latencies with a single-triple in-memory random-access time \nof 4 ?s shows that clustering is not an end in itself. The principal value \nof clustering is the fact that there is no limit to the amount of RAM nor \nRAM bandwidth.\n\nThus, it is evident that no benefit can be had from clustering unless messages \nare made to carry the maximum number of operations possible.\n\n---++Partitioning vs Cache Fusion\n\nClustered databases have traditionally partitioned data between machines \naccording to the values of one or more columns of a table. Another approach \nis cache fusion as in [[http://www.oracle.com/technology/products/database/clustering/index.html][Oracle RAC]] [5]. With a cache fusion database, all \nmachines of the cluster see the same disks but keep their local cache of \nthis and have a cache-coherence protocol for managing concurrent update. We \nhave not measured Oracle RAC but it is our impression that either an index \nlookup must be sent to the machine that holds the next page needed by the \nlookup or that the page must be transferred to the node making the lookup. \nIn the latter case, we quickly get the same working set cached on all nodes. \nIn the former case, we have a message round trip per page traversed, typically \n4 round trips for a 4 level index tree. Either seems prohibitive in light of \nthe fact that a single lookup is a few microseconds when all the data is local \nand in memory. This is true of Oracle as well as Virtuoso.\n\nFor this reason, we decided to go for partitioning. Most databases specify \npartitioning at the table level. We specify it at the index level, thus \ndifferent keys of the same table may reside on different machines. \n\nProponents of cache fusion correctly point out that users do not know how to \npartition databases and that repartitioning a big database is next to impossible \ndue to resulting downtime. The difficulty is reduced in the case of RDF since \nonly a few tables are used for the data and they come pre-partitioned. The \nrepartitioning argument is still valid in part.\n\nWe recognize that a Web-scale system simply cannot depend on a partitioning map \nthat is set once-and-for-all, nor require reinserting the data when reallocating \nhardware resources. Google's [[http://labs.google.com/papers/bigtable-osdi06.pdf][Bigtable]] [6] and Amazon's [[http://www.scs.stanford.edu/08sp-cs144/sched/readings/amazon-dynamo-sosp2007.pdf][Dynamo]] [7] each address \nthis in different ways.\n\nWith Virtuoso, we have hash partitioning where the hash picks a logical partition \nout of a space of /n/ logical partitions, where /n/ is a number several times \nlarger than the expected maximum machine count. Each logical partition is then \nassigned to a physical machine. When the machine allocation changes, logical \npartitions may be moved between nodes. When a partition is being moved, it \ncontinues to be served from the machine initially hosting it but a special log \nis kept for updates that hit already copied rows of the partition. Once the copy \nis complete, the partition is made read-only, the log is applied to the new host \nand subsequent queries are routed to the new host of the logical partition. \nRepartitioning is still a fairly heavy operation but does not involve downtime. \nSince one database file-set hosts many logical partitions, unequal slices can be \nallocated according to hardware capacity. Still, more flexibility could be had \nif each logical partition had its own database file-set. Then moving the partition \nwould be a file copy instead of a database insert plus a delete of the logical \ncontent. The latter arrangement may be implemented later; it was not done now \ndue to it involving more code.\n\n---++Latency-Tolerant Load and Query Execution\n---+++Load\n\nWhen loading RDF data, the database must translate between IRIs and literals and \ntheir internal IDs. It must then insert the resulting quad in each index. With \na single process, as long as no data needs to be read from disk, the load rate \nis about 15Kt (kilotriples) per core.\n\nMaking a round trip per triple is out of the question. The load takes a series \nof 10,000 triples, and then for each unique IRI/literal, sends a request to \nallocate/return an ID for the node to the cluster node responsible for the \npartition given by the name. Whenever all the fields of the triple are known, \neach index entry of the triple gets put in the inserts queued for the box \nholding the partition. In this way, a batch of arbitrarily many triples can be \ninserted in a maximum of 4 round trips, each round trip consisting of messages \nthat evenly fan out between machines.\n\nIn this way, even when all processes are on a single SMP box, clustered load is \nactually faster than single process load. The reason is that single process load \nsuffers from waits for serializing access to shared data structures in the index. \nWe remember that a single mutex wait takes as long as a full single-key insert, \ni.e., 5-6 ?s.\n\n---+++Query\n\nAn RDF query primarily consists of single-key lookups grouped in nested-loop joins. \nSometimes there are also bitmap intersections. Most result set columns are calculated \nby function calls since the internal IDs of IRIs and objects must be translated to \ntext for return to the application.\n\nThe basic query is therefore a pipeline of steps, where most steps are individually \npartitioned operations. Sometimes consecutive steps can be partitioned together \nand dispatched as a unit.\n\nThe pattern\n<verbatim>\n{?x a ub:Professor . ?x teacher_of <student> }\n</verbatim>\nis a bitmap intersection where the <code>Professor</code> bits are merge-intersected \nwith the <code>teacher_of</code> bits of <code>&lt;student&gt;</code>.\n\n<verbatim>\n{ ?x a ub:Professor . ?x teaches_course ?c }\n</verbatim>\nis a loop join starting with the <code>Professor</code> bitmap and then retrieving \nthe courses taught from an index.\n\nThe whole query\n<verbatim>\nselect * from <lubm> where { ?x a ub:Professor ; ub:AdvisorOf ?y }\n</verbatim>\nis a pipeline of 4 steps: \n   1 translating the IRIs of the constants to IDs, \n   1 getting the professors, \n   1 getting the students they advise, and \n   1 translating the IDs to text.\n\n<verbatim>\nselect * from <lubm> where\n{ ?x a ub:Professor ; ub:advisorOf ?y ; ub:telephone ?tel }\n</verbatim>\nis still a pipeline of 4 steps because the two <code>ub:advisorOf</code> and \n<code>ub:telephone</code> property retrievals are co-located, since they have \nthe same subject and the <code>GSPO</code> index is partitioned on subject.\n\nThe results have to be retrieved in deterministic order for result set slicing. If \nthere is an explicit <code>ORDER BY</code> or an aggregate, this is no longer the case, and \nresults can be processed in the order they become available.\n\nEach step of the pipeline takes /n/ inputs of the previous stage, partitions them, \nand sends a single message to each cluster node involved. If intermediate sets are \nlarge, they are processed in consecutive chunks. Execution of pipeline steps may \noverlap in time, and generally a step is divided over multiple partitions.\n\nNormally, one thread per query per node is used. Making too many threads will simply \ncongest the index due to possible mutex waits. On an idle machine, it may make sense \nto serve a batch of lookups on two threads instead of one, though. Further, since \nrequests come in batches, if a lookup requires a disk read, the disk read can be \nstarted in the background and the next index lookup started until this too would need \ndisk, and so on. This has the benefit of sorting a random set of disk cache misses \ninto an ascending read sequence.\n\n---++Distributed Pipe and Map-Reduce\n\nAs said before, RDF queries operate with IDs but must return the corresponding text. \nThis implies a partitioned function call for each result column. Virtuoso SQL has \na generic partitioned pipe feature. This takes a row of function/argument pairs, \npartitions these by some feature of the arguments, and returns the results for each \ninput row once all the functions on the row have returned. This may be done preserving \norder or as results are available. It is possible also to block waiting for the whole \npipe to be empty. The operations may have side effects and may either commit singly \nor be bound together in a single distributed transaction.\n\nAside from returning a result, the partitioned pipe function may return a set of \nfollow-up functions and their arguments. These get partitioned and dispatched in turn. \nThus, this single operation can juggle multiple consecutive map or reduce steps. There \nis a SQL procedure language API for this, but most importantly, the SQL compiler \ngenerates these constructs automatically when function calls occur in queries.\n\n---++Inference for the Web &#8212; The Blessing and Bane of \"sameAs\"\n\nWhen there is a well understood application and data is curated before import, \nentailed facts may often be forward-chained and identifiers made consistent. \nIn a multiuser Web scenario, it is not possible for everybody to materialize \nthe consequences of their particular rules over all the data. Thus, inference \nmust take place when needed, not in anticipation of maybe being needed sometime.\n\nSub-properties and subclasses are easy to deal with at query run time. Given \nthe proper pragmas, Virtuoso SPARQL will take\n<verbatim>\n{ ?x ub:Professor . ?x ub:worksFor ?dept }\n</verbatim>\nand generate the code for first looping over all subclasses of ub:Professor, and \nthen all sub-properties of ub:worksFor. This stays a two step pipeline since the \ncluster node running the query knows the subclasses and sub-properties. With some \nluck, assistant professors and full professors will be in a different partition, \nadding some lateral parallelism to the operation.\n\nThe case of \"<code>sameAs</code>\" or transitive properties in general, such as <code>part-of</code>, are \nmore complex. The principal problem is that for a pattern like <code>{&lt;subpart&gt; \npart-of &lt;whole&gt;}</code>, it is not self-evident whether one should go from \n<code>&lt;subpart&gt;</code> up or from <code>&lt;whole&gt;</code> down. Also, the \ncardinalities at each level, as well as the depth of the tree, are hard to guess.\n\n\"<code>sameAs</code>\" is specially supported as an intermediate query graph node. It has no \nspecial cost model but it will take all fixed fields of the next join step and \nexpand these into their \"<code>sameAs</code>\" statements, to full transitive closure. The \nfeature is enabled for the whole query or for a single triple pattern with a \nSPARQL pragma. In a cluster situation, it is possible to just initialize the \n\"<code>sameAs</code>\" expansion when first reaching the place and to continue with the value \none has as normally. In this event, if there are no \"<code>sameAs</code>\" statements, no extra \npipeline step is added; the existing step just gets two more operations &emdash; \none looking for <code>>?sas owl:sameAs ?thing</code> and the other for \n<code>?thing owl:sameAs ?sas</code>. If synonyms are found, they can be fed back \ninto the step.\n\n---++On Redundancy\n\nA Web-scale RDF store will inevitably be quite large. One may count 16G of RAM \nper machine and about 1 billion triples per 16G RAM to keep a reasonable working \nset. 100 billion triples would be 100 machines. Of course, fitting infinitely many \ntriples on disk is possible but when the memory to disk ratio deteriorates running \nqueries of any complexity is not possible on-line.\n\nAs a basis for the above, one may consider that DBpedia with 198M triples is about \n2M database pages, 16Gb without gzip. If data have strong locality, then about \n5 times this could fit on a box without destroying working set. As machines are \nmultiplied, failures become more common and failover becomes important.\n\nWe address this by allowing each logical partition to be allocated on multiple \nnodes. At query time, a randomly selected partition is used to answer the query \nif the data is not local. At update time, all copies are updated in the same \ntransaction. This is transparent and is used for example for all schema information \nthat is replicated on all nodes. \n\nStoring each partition in duplicate or triplicate has little effect on load rate \nand can balance query load. Fault tolerance is obtained as a bonus. At present, \nthe replicated storage is in regular use but a special RDF adaptation of this with \nadministration, automatic reconstruction of failed partitions, etc., has to be done.\n\n---++Some Metrics\n---+++Load\n\nWhen loading data at a rate of 40 Ktriples/s, the network traffic is 170 messages/s \nand the aggregate throughput is 10MB/s. Since the load is divided evenly over all \nnode-node connections, there is no real network congestion, and scale can be \nincreased without hitting a network bottleneck.\n\n---+++Query\n\nWe have run the LUBM query mix against a 4-process Virtuoso cluster on one 4-core \nSMP box. With one test driver attached to each of the server processes, we get 330% \nof 400% server CPU load on servers and 30% on test drivers. During the test, the \ncluster interconnect cross sectional traffic is 1620 messages/s at 18MB/s while the \naggregate query rate is 34 queries/s.\n\nWe see that we are not even near the maximum interconnect throughputs described \nearlier and that we can run complex queries with reasonable numbers of messages, \nabout 1620/34 = 47 messages per query. The count includes both request and \nresponse messages.\n\nThe specifics of the test driver and query mix are given in the [[http://www.openlinksw.com/weblog/oerling/?id=1284][Virtuoso LUBM Load]] [8] blog post. The only difference \nwas that a Virtuoso Cluster v6 instance was used instead.\n\n---++Linked Data Applications\n\nAs of this writing, OpenLink hosts [[http://www.cs.vu.nl/~pmika/swc/btc.html][several billion triples worth of linked open data]] [9]. \nThese are being transferred to Virtuoso Cluster v6 servers as of the time of this \nwriting. In addition, the data aggregated from the Web by [[http://zitgist.com/][Zitgist]] are being moved to \nVirtuoso Cluster v6. Experiments are also being undertaken with the [[http://sindice.com/][Sindice]] semantic \nWeb search engine.\n\n---++Future Directions\n\nThe linked data business model will have to do with timeliness and quality of data \nand references. Data are becoming a utility. Thus far there has been text search at \narbitrary scale. Next there will be analytics and meshups at Web-scale. This requires \na cloud data and cloud computing model since no single data center of Google, Yahoo, \nor any other, can accommodate such a diverse and unpredictable load. Thus the ones \nneeding the analysis will have to pay for the processing power, but this must be \nadaptive and demand-based. Our work is to provide rapid deployment of arbitrary scale \nRDF and other database systems for the clouds. This involves also automatic partitioning \nand repartitioning as mentioned earlier. Google and Amazon have work in this direction \nbut we may be the first to provide Bigtable- or Dynamo-like automatic adaptation for \na system with general-purpose relational transaction semantics and full-strength query \nlanguages.\n\n---++Conclusions\n\nAside from this, adapting query-planning cost models to data that contains increasing \ninference will be relevant for backward-chaining support of more and more complex \ninference steps. Also, we believe that common graph algorithms such as shortest-path, \nspanning-tree, and traveling-salesman, may have to become query language primitives, \nbecause their implementation in a cluster environment is non-trivial to do efficiently.\n\n---++Appendix A &#8212; Metrics and Environment\n\nWe use version 2 of DBpedia [4] as a sample data set for RDF storage space unless \notherwise indicated. When CPU speeds are discussed, they have been measured with \na 2GHz Intel Xeon 5130 unless otherwise indicated. Networks are Gbit ethernet \nwith Linksys switches.\n\n---++References\n\n   1 Orri Erling: [[http://www.w3.org/2007/03/RdfRDB/papers/erling.html][Declaring Linked Data Views of SQL Data]], \n&lt;[[http://www.w3.org/2007/03/RdfRDB/papers/erling.html][http://www.w3.org/2007/03/RdfRDB/papers/erling.html]]&gt;\n   1 Orri Erling, Ivan Mikhailov: [[http://CEUR-WS.org/Vol301/Paper%205%20Erling.pdf][RDF Support in the Virtuoso DBMS]]. In Franconi et al. (eds), \nProc. of the 1st Conference on Social Semantic Web, Leipzig, Germany, Sep 26-28, 2007, \nCEUR Proceedings, ISSN 1613-0073, \n&lt;[[http://CEUR-WS.org/Vol301/Paper%205%20Erling.pdf][http://CEUR-WS.org/Vol301/Paper%205%20Erling.pdf]]&gt;. \n   1 Orri Erling: [[http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing][Advances in Virtuoso RDF Triple Storage (Bitmap Indexing)]] \n&lt;[[http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing][http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing]]&gt;\n   1 Soeren Auer, Jens Lehmann: \n[[http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf][What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content]]. \nIn Franconi et al. (eds), Proceedings of European Semantic Web Conference (ESWC07), LNCS 4519, pp. 503517, Springer, 2007, \n&lt;[[http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf][http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf]]&gt;.\n   1 Oracle Real Application Clusters. \n&lt;[[http://www.oracle.com/technology/products/database/clustering/index.html][http://www.oracle.com/technology/products/database/clustering/index.html]]&gt;\n   1 Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: \n[[http://labs.google.com/papers/bigtable-osdi06.pdf][Bigtable: A Distributed Storage System for Structured Data]]. \n&lt;[[http://labs.google.com/papers/bigtable-osdi06.pdf][http://labs.google.com/papers/bigtable-osdi06.pdf]]&gt;  \n   1 Giuseppe DeCandia, Deniz Hastorun, Madan Jampani et al: \n[[http://www.scs.stanford.edu/08sp-cs144/sched/readings/amazon-dynamo-sosp2007.pdf][Dynamo: Amazon's Highly Available Key-value Store]]. \n&lt;[[http://www.scs.stanford.edu/08sp-cs144/sched/readings/amazon-dynamo-sosp2007.pdf][http://www.scs.stanford.edu/08sp-cs144/sched/readings/amazon-dynamo-sosp2007.pdf]]&gt;.\n   1 Orri Erling: [[http://www.openlinksw.com/weblog/oerling/?id=1284][Virtuoso LUBM Load]]\n&lt;[[http://www.openlinksw.com/weblog/oerling/?id=1284][http://www.openlinksw.com/weblog/oerling/?id=1284]]&gt;\n   1 Data set for the Billion Triples Challenge \n&lt;[[http://www.cs.vu.nl/~pmika/swc/btc.html][http://www.cs.vu.nl/~pmika/swc/btc.html]]&gt;" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#id>	"3f79e66b912d88a87a668a3c5bb820f0" .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://zitgist.com/> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#links_to>	<http://www.openlinksw.com/> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://atomowl.org/ontologies/atomrdf#author>	<http://vos.openlinksw.com/dataspace/person/dav#this> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/services#has_services>	<http://vos.openlinksw.com/dataspace/services/wiki/item> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://xmlns.com/foaf/0.1/maker>	<http://vos.openlinksw.com/dataspace/person/dav#this> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://atomowl.org/ontologies/atomrdf#source>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://www.openlinksw.com/schema/attribution#isDescribedUsing>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF/sioc.rdf> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://rdfs.org/sioc/ns#link>	<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF> .
<http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VOSArticleWebScaleRDF>	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://atomowl.org/ontologies/atomrdf#Entry> .