orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
theli
f3ac4dbbb9
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
orbiter
3879a0ecd0
replaced java.net.URL usage by use of new class de.anomic.net.URL
...
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
63f39ac7b5
added 3 new crawling steering options:
...
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1fc3b34be6
some pre-work (without function yet) to implement:
...
- re-crawl (by age of last crawl)
- auto-crawl-filter by crawl depth (to be explained..)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 15:28:17 +00:00
theli
bab74b0499
*) escaping special chars in the url properly
...
- was a problem for QuickCrawlLink_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1607 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-11 13:30:36 +00:00
theli
6d73c6b481
*) adding xml version for QuickCrawlLink_p.java (used by yacybar)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-27 15:36:42 +00:00
theli
3feeba3d7b
*) better handling of urls containing query parameters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1445 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-26 09:34:50 +00:00
orbiter
a04930f025
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
theli
6f8d7d3bcd
*) Adding first version of YaCy bookmarklet
...
- this can be used to easily crawl a webpage which is currently opened in the browser
- to get the bookmarklet javascript simply call http://localhost:8000/QuickCrawlLink_p.html
and drag and drop the link shown to your Browsers Toolbar/Link-Bar.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-08 12:14:51 +00:00