Commit Graph

6 Commits

Author SHA1 Message Date
Michael Peter Christen
d8beafba3a fix for values in CrawlProfileEditor table and xml; now the full profile
is available in the xml.
2014-10-09 13:27:20 +02:00
Michael Peter Christen
ec95dfa2e6 fixed crawl profile xml result which did not show the correct crawl
status.
2014-10-08 18:48:57 +02:00
Michael Peter Christen
329262231f unresolved pattern fix 2014-10-01 15:34:43 +02:00
orbiter
f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
- nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler
- nobody ever used the auto-dom filter other than with a crawl depth of 1
- the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste
- the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain
- the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used
- the new option also fits into the easy site-crawl start menu

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 12:50:34 +00:00
orbiter
dba7ef5144 extended crawling constraints:
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 09:58:56 +00:00
f1ori
b9602e891a * added CrawlProfileEditor_p.xml for monitoring in yacybar
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-18 09:13:02 +00:00