"2017-06-13T05:48:33.027482"^^ . "VirtSetCrawlerJobsGuideSitemaps" . . . . . "2017-06-13T05:48:33Z" . . . . . . . . . . . . "VirtSetCrawlerJobsGuideSitemaps" . . . . . . . . . "%META:TOPICPARENT{name=\"VirtSetCrawlerJobsGuide\"}%\n---+Setting up a Content Crawler Job to retrieve Sitemaps\n\nThe following guide describes how to set up a crawler job for getting content of a basic Sitemap where the source includes RDFa.\n\n 1 From the Virtuoso Conductor User Interface i.e. http://cname:port/conductor, login as the \"dba\" user.\n 1 Go to \"Web Application Server\" tab.\n%BR%%BR%%BR%%BR%\n 1 Go to the \"Content Imports\" tab.\n%BR%%BR%%BR%%BR%\n 1 Click on the \"New Target\" button.\n%BR%%BR%%BR%%BR%\n 1 In the form displayed:\n * Enter a name of choice in the \"Crawl Job Name\" text-box: \n\nBasic Sitemap Crawling Example \n\n * Enter the URL of the site to be crawled in the \"Data Source Address (URL)\" text-box: \n\nhttp://psclife.pscdog.com/catalog/seo_sitemap/product/ \n\n * Enter the location in the Virtuoso WebDAV repository the crawled should stored in the \"Local WebDAV Identifier\" text-box, for example, if user demo is available, then: \n\n/DAV/home/demo/basic_sitemap/\n\n * Choose the \"Local resources owner\" for the collection from the list-box available, for ex: user demo.\n * Select the \"Accept RDF\" check-box. \n%BR%%BR%%BR%%BR%%BR%\n 1 Click the \"Create\" button to create the import:\n%BR%%BR%%BR%%BR%\n 1 Click the \"Import Queues\" button.\n 1 For the \"Robot targets\" with label \"Basic Sitemap Crawling Example \" click the \"Run\" button.\n 1 This will result in the Target site being crawled and the retrieved pages stored locally in DAV and any sponged triples in the RDF Quad store.\n%BR%%BR%%BR%%BR%\n 1 Go to the \"Web Application Server\" -> \"Content Management\" tab.\n%BR%%BR%%BR%%BR%\n 1 Navigate to the location of newly created DAV collection:\n\n/DAV/home/demo/basic_sitemap/\n\n 1 The retrieved content will be available in this location.\n%BR%%BR%%BR%%BR%\n\n\n---++Related\n\n * [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]]\n * [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up a Content Crawler Job to Add RDF Data to the Quad Store]]\n * [[VirtSetCrawlerJobsGuideSemanticSitemaps][Setting up a Content Crawler Job to Retrieve Semantic Sitemaps (a variation of the standard sitemap)]]\n * [[VirtSetCrawlerJobsGuideDirectories][Setting up a Content Crawler Job to Retrieve Content from Specific Directories]]\n * [[VirtCrawlerSPARQLEndpoints][Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint]]" . . "2017-06-13T05:48:33Z" . . . . "8e3b28cf81a7848dc1ce50585dfeebf2" . . . . . . "2017-06-13T05:48:33.027482"^^ . "VirtSetCrawlerJobsGuideSitemaps" .