yacy_search_server/source/net/yacy/document/parser
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
- when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once
- the harvester runs concurrently and feeds into the normal crawl queue

also in this:
- fixed the 'start from file' crawl function
- added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags.
- this causes that a crawl start is now also possible from clear text link files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-09 17:17:25 +00:00
..
html added a very fast ftp file list generator to site crawler: 2010-12-09 17:17:25 +00:00
images - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only 2010-11-30 16:13:55 +00:00
xml - added new protocol loader for 'file'-type URLs 2010-05-25 12:54:57 +00:00
bzipParser.java fixed bugs in parser and ftp client 2010-12-02 11:05:04 +00:00
csvParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
docParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
genericParser.java fixed bugs in parser and ftp client 2010-12-02 11:05:04 +00:00
gzipParser.java fixed bugs in parser and ftp client 2010-12-02 11:05:04 +00:00
htmlParser.java - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed 2010-09-27 14:54:32 +00:00
odtParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
ooxmlParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
pdfParser.java - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only 2010-11-30 16:13:55 +00:00
pptParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
psParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
rssParser.java - enhancements for search speed 2010-10-04 11:54:48 +00:00
rtfParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
sevenzipParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
sitemapParser.java added a sitemap entry parser and loader for sitemaps 2010-11-03 19:48:33 +00:00
swfParser.java more performance hacks 2010-10-09 08:55:57 +00:00
tarParser.java fixed bugs in parser and ftp client 2010-12-02 11:05:04 +00:00
torrentParser.java - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only 2010-11-30 16:13:55 +00:00
vcfParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
vsdParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
xlsParser.java Support for indexing of RSS feeds! 2010-08-25 18:24:54 +00:00
zipParser.java fixed bugs in parser and ftp client 2010-12-02 11:05:04 +00:00