Attributes | Values |
---|
type
| |
Date Created
| |
Date Modified
| |
label
| - VirtSetCrawlerJobsGuideSitemaps
|
maker
| |
Title
| - VirtSetCrawlerJobsGuideSitemaps
|
isDescribedUsing
| |
has creator
| |
attachment
| |
content
| - %META:TOPICPARENT{name="VirtSetCrawlerJobsGuide"}%
---+Setting up a Content Crawler Job to retrieve Sitemaps
The following guide describes how to set up a crawler job for getting content of a basic Sitemap where the source includes RDFa.
1 From the Virtuoso Conductor User Interface i.e. http://cname:port/conductor, login as the "dba" user.
1 Go to "Web Application Server" tab.
%BR%%BR%%BR%%BR%
1 Go to the "Content Imports" tab.
%BR%%BR%%BR%%BR%
1 Click on the "New Target" button.
%BR%%BR%%BR%%BR%
1 In the form displayed:
* Enter a name of choice in the "Crawl Job Name" text-box:
Basic Sitemap Crawling Example
* Enter the URL of the site to be crawled in the "Data Source Address (URL)" text-box:
http://psclife.pscdog.com/catalog/seo_sitemap/product/ 
* Enter the location in the Virtuoso WebDAV repository the crawled should stored in the "Local WebDAV Identifier" text-box, for example, if user demo is available, then:
/DAV/home/demo/basic_sitemap/
* Choose the "Local resources owner" for the collection from the list-box available, for ex: user demo.
* Select the "Accept RDF" check-box.
%BR%%BR%%BR%%BR%%BR%
1 Click the "Create" button to create the import:
%BR%%BR%%BR%%BR%
1 Click the "Import Queues" button.
1 For the "Robot targets" with label "Basic Sitemap Crawling Example " click the "Run" button.
1 This will result in the Target site being crawled and the retrieved pages stored locally in DAV and any sponged triples in the RDF Quad store.
%BR%%BR%%BR%%BR%
1 Go to the "Web Application Server" -> "Content Management" tab.
%BR%%BR%%BR%%BR%
1 Navigate to the location of newly created DAV collection:
/DAV/home/demo/basic_sitemap/
1 The retrieved content will be available in this location.
%BR%%BR%%BR%%BR%
---++Related
* [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]]
* [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up a Content Crawler Job to Add RDF Data to the Quad Store]]
* [[VirtSetCrawlerJobsGuideSemanticSitemaps][Setting up a Content Crawler Job to Retrieve Semantic Sitemaps (a variation of the standard sitemap)]]
* [[VirtSetCrawlerJobsGuideDirectories][Setting up a Content Crawler Job to Retrieve Content from Specific Directories]]
* [[VirtCrawlerSPARQLEndpoints][Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint]]
|
id
| - 8e3b28cf81a7848dc1ce50585dfeebf2
|
link
| |
has container
| |
http://rdfs.org/si...ices#has_services
| |
atom:title
| - VirtSetCrawlerJobsGuideSitemaps
|
links to
| |
atom:source
| |
atom:author
| |
atom:published
| |
atom:updated
| |
topic
| |
is made
of | |
is container of
of | |
is link
of | |
is http://rdfs.org/si...vices#services_of
of | |
is links to
of | |
is creator of
of | |
is atom:entry
of | |
is atom:contains
of | |