yacy_search_server/source/net/yacy/document/parser/html
orbiter c36da90261 added a very fast ftp file list generator to site crawler:
- when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once
- the harvester runs concurrently and feeds into the normal crawl queue

also in this:
- fixed the 'start from file' crawl function
- added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags.
- this causes that a crawl start is now also possible from clear text link files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-09 17:17:25 +00:00
..
AbstractScraper.java performance hacks for better search performance 2010-10-08 23:50:28 +00:00
AbstractTransformer.java removed finalize methods because of a hint in 2010-04-23 09:32:29 +00:00
CharacterCoding.java performance hacks for better search performance 2010-10-08 23:50:28 +00:00
ContentScraper.java added a very fast ftp file list generator to site crawler: 2010-12-09 17:17:25 +00:00
ContentTransformer.java applied code changes that are recommended by PMD 2010-01-10 23:09:48 +00:00
ImageEntry.java - added new protocol loader for 'file'-type URLs 2010-05-25 12:54:57 +00:00
Scraper.java
ScraperInputStream.java - added new protocol loader for 'file'-type URLs 2010-05-25 12:54:57 +00:00
ScraperListener.java
Transformer.java
TransformerWriter.java * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null) 2010-10-26 16:10:20 +00:00