dtSearch Spider
|
dtSearch
Desktop
, Network
and Web
contain a built-in Spider that provides
integrated searching of remote Web
site content, along with locally-available
data. The dtSearch Spider can index
and search dynamically-generated content,
such as ASP/ASP.NET, MS CMS, SharePoint,
Sitecore etc.
|
|
The
Spider can index XML, HTML, ASP and
ASP.NET Web pages, as well as online
postings of text based documents such
as PDF, word processor files and spreadsheets. |
|
dtSearch
Desktop and Network will display Web
pages and documents with highlighted
hits as well as links and images intact
within HTML and PDF files. |
How
the dtSearch Spider Works
To
index a Web site, in the Update Index Dialog
select "Add Web..."
In
the dialog box, type or paste the name (URL)
of the Web site, for example http://www.federalreserve.gov/
then select the crawl depth; a crawl depth
of 1 will reach only pages linked directly
to the home page, a crawl depth of 4 will
reach four levels deep into the site
and so on.
Options allow the Spider to crawl across
multiple servers from a single starting
URL, limit the maximum size of files to
download, the number of files to index and
number of minutes to spend indexing on a
single web site. The Spider supports the
robot's "no index" and "no
follow" meta tags. The Spider can perform
"vertical" searching of pages
linked from a URL, as well as "horizontal"
crawling of sites linked to a URL.
Online
Demo
For
a Spider demo operating through dtSearch
Web, click
here.
The www.dtsearch.com spidered site is hosted
on a completely different hosting system
and physical location from the site that
is running the Search Site demo.
Technical
Notes
Web
pages or text can be cached in version 7
indexes see
here for details.
In
addition to searching publicly available
Web sites, the Spider also supports indexing
and searching of secure content HTTPS sites
and password-accessible sites.The Spider
also supports forms-based authentication.
For
information on searching ASP, please see
this FAQ article:
How
to use dtSearch Web with dynamically-generated
content.
|