Michael Peter Christen
1825f165b8
better integration of blacklist according to use case
2012-07-02 13:57:29 +02:00
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
2012-01-23 17:27:29 +01:00
Michael Peter Christen
ce620be783
for for crawl start with smb url
2012-01-19 23:07:15 +01:00
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
2011-12-29 00:33:16 +01:00
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 11:23:52 +00:00
orbiter
c93f10417a
add a bookmark automatically each time a new crawl is started
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-22 00:03:20 +00:00
orbiter
017a01714d
- enhanced logging in robots.txt parser for remote debugging
...
- robots.txt is now more robust against database operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 01:03:49 +00:00
cominch
cef8ebc41d
getpageinfo: Checks if there is a OAI repository behind the URL.
...
This check is only performed if oai parameter is set when calling e.g. getpageinfo_p.xml?actions=oai
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 12:22:19 +00:00
orbiter
eb1c7c041d
write info about robots.txt evaluation into getpageinfo_p.xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8038 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:33:54 +00:00
orbiter
f8b8c82421
- refactoring of getpageinfo_p.xml (moved out of util)
...
- added more logging in getpageinfo_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:22:40 +00:00