dtSearch UK logo
Home    Products    Case Studies    Buy    Downloads    Support    About  
 

dtSearch Spider

dtSearch Desktop , Network and Web contain a built-in Spider that provides integrated searching of remote Web site content, along with locally-available data. The dtSearch Spider can index and search dynamically-generated content, such as ASP/ASP.NET, MS CMS, SharePoint, Sitecore etc.

 The Spider can index XML, HTML, ASP and ASP.NET Web pages, as well as online postings of text based documents such as PDF, word processor files and spreadsheets.
 dtSearch Desktop and Network will display Web pages and documents with highlighted hits as well as links and images intact within HTML and PDF files.

How the dtSearch Spider Works

To index a Web site, in the Update Index Dialog select "Add Web..."


In the dialog box, type or paste the name (URL) of the Web site, for example http://www.federalreserve.gov/ then select the crawl depth; a crawl depth of 1 will reach only pages linked directly to the home page, a crawl depth of 4 will reach four levels deep into the site and so on.

Options allow the Spider to crawl across multiple servers from a single starting URL, limit the maximum size of files to download, the number of files to index and number of minutes to spend indexing on a single web site. The Spider supports the robot's "no index" and "no follow" meta tags. The Spider can perform "vertical" searching of pages linked from a URL, as well as "horizontal" crawling of sites linked to a URL.

Online Demo
For a Spider demo operating through dtSearch Web, click here.
The www.dtsearch.com spidered site is hosted on a completely different hosting system and physical location from the site that is running the Search Site demo.

Technical Notes
Web pages or text can be cached in version 7 indexes see here for details.

In addition to searching publicly available Web sites, the Spider also supports indexing and searching of secure content HTTPS sites and password-accessible sites.The Spider also supports forms-based authentication.

For information on searching ASP, please see this FAQ article:
How to use dtSearch Web with dynamically-generated content.

 

 

Related Topics

 FindPlus® Distributed Searching for searching remote enterprise servers.
 dtSearch Web for adding searching to your own Web site.
 For more information, see dtSearch Desktop, dtSearch Network

The dtSearch Engine includes a
.NET API for the Spider