. . . . "2017-06-13T05:37:45Z" . . . . . "2017-06-13T05:37:45Z" . "VirtSetCrawlerJobsGuideDirectories" . . . . . . . . "2017-06-13T05:37:45.484655"^^ . "2017-06-13T05:37:45.484655"^^ . . . . . . . "01b349799c30efc349e0f22448cc4a70" . . "VirtSetCrawlerJobsGuideDirectories" . . "%META:TOPICPARENT{name=\"VirtSetCrawlerJobsGuide\"}%\n---+Setting up a Content Crawler Job to Retrieve Content from Specific Directories\n\nThe following guide describes how to set up crawler job for getting directories using Conductor.\n\n\n 1 Go to Conductor UI. For ex. at http://localhost:8890/conductor .\n 1 Enter dba credentials.\n 1 Go to \"Web Application Server\".\n%BR%%BR%%BR%%BR%\n 1 Go to \"Content Imports\".\n%BR%%BR%%BR%%BR%\n 1 Click \"New Target\".\n%BR%%BR%%BR%%BR%\n 1 In the shown form set respectively:\n * \"Crawl Job Name\": \n\nGov.UK data\n\n * \"Data Source Address (URL)\": \n\nhttp://source.data.gov.uk/data/\n\n * \"Local WebDAV Identifier\" for available user, for ex. demo: \n\n/DAV/home/demo/gov.uk/\n\n * Choose from the available list \"Local resources owner\" an user, for ex. demo ;\n%BR%%BR%%BR%%BR%\n * Click the button \"Create\". \n 1 As result the Robot target will be created:\n%BR%%BR%%BR%%BR%\n 1 Click \"Import Queues\".\n%BR%%BR%%BR%%BR%\n 1 For \"Robot target\" with label \"Gov.UK data \" click \"Run\".\n 1 As result will be shown the status of the pages: retrieved, pending or respectively waiting.\n%BR%%BR%%BR%%BR%\n 1 Click \"Retrieved Sites\"\n 1 As result should be shown the number of the total pages retrieved.\n%BR%%BR%%BR%%BR%\n 1 Go to \"Web Application Server\" -> \"Content Management\" .\n 1 Enter path:\n\nDAV/home/demo/gov.uk\n\n%BR%%BR%%BR%%BR%\n 1 Go to path: \n\nDAV/home/demo/gov.uk/data\n\n 1 As result the retrieved content will be shown.\n%BR%%BR%%BR%%BR%\n\n\n---++Related\n\n * [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]]\n * [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up a Content Crawler Job to Add RDF Data to the Quad Store]]\n * [[VirtSetCrawlerJobsGuideSitemaps][Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa)]]\n * [[VirtSetCrawlerJobsGuideSemanticSitemaps][Setting up a Content Crawler Job to Retrieve Semantic Sitemaps (a variation of the standard sitemap)]]\n * [[VirtCrawlerSPARQLEndpoints][Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint]]" . . . . . . . "VirtSetCrawlerJobsGuideDirectories" . . .