Attributes | Values |
---|
type
| |
Date Created
| |
Date Modified
| |
label
| - VirtSetCrawlerJobsGuideDirctories
|
maker
| |
Title
| - VirtSetCrawlerJobsGuideDirctories
|
isDescribedUsing
| |
has creator
| |
attachment
| |
content
| - ---+Guide for Setting up Crawler Jobs for Directories
The following guide describes how to set up crawler job for getting directories using Conductor.
1 Go to Conductor UI. For ex. at http://localhost:8890/conductor .
1 Enter dba credentials.
1 Go to "Web Application Server".
%BR%%BR%%BR%%BR%
1 Go to "Content Imports".
%BR%%BR%%BR%%BR%
1 Click "New Target".
%BR%%BR%%BR%%BR%
1 In the shown form:
* Enter for "Target description":
Gov.UK data
* Enter for "Target URL":
http://source.data.gov.uk/data/
* Enter for "Copy to local DAV collection" for available user, for ex. demo:
/DAV/home/demo/gov.uk/
* Choose from the available list "Local resources owner" an user, for ex. demo ;
%BR%%BR%%BR%%BR%
* Click the button "Create".
1 As result the Robot target will be created:
%BR%%BR%%BR%%BR%
1 Click "Import Queues".
%BR%%BR%%BR%%BR%
1 For "Robot target" with label "Gov.UK data " click "Run".
1 As result will be shown the status of the pages: retrieved, pending or respectively waiting.
%BR%%BR%%BR%%BR%
1 Click "Retrieved Sites"
1 As result should be shown the number of the total pages retrieved.
%BR%%BR%%BR%%BR%
1 Go to "Web Application Server" -> "Content Management" .
1 Enter path:
DAV/home/demo/gov.uk
%BR%%BR%%BR%%BR%
1 Go to path:
DAV/home/demo/gov.uk/data
1 As result the retrieved content will be shown.
%BR%%BR%%BR%%BR%
---++Related
* [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]]
* [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up Crawler Job for inserting RDF data]]
* [[VirtSetCrawlerJobsGuideSitemaps][Setting up Crawler Job for retrieving Sitemaps (basic where the source has RDFa)]]
* [[VirtSetCrawlerJobsGuideSemanticSitemaps][Setting up Crawler Job for retrieving Semantic Sitemaps -- a variation of standard sitemap]]
|
id
| - e04c88cf189ae48746a4a2959589cac3
|
link
| |
has container
| |
http://rdfs.org/si...ices#has_services
| |
atom:title
| - VirtSetCrawlerJobsGuideDirctories
|
links to
| |
atom:source
| |
atom:author
| |
atom:published
| |
atom:updated
| |
topic
| |
is made
of | |
is container of
of | |
is link
of | |
is http://rdfs.org/si...vices#services_of
of | |
is creator of
of | |
is atom:entry
of | |
is atom:contains
of | |