theli
7930839594
*) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
...
*) CrawlWorker.java: using new dirhtml function of ftpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 05:17:57 +00:00
theli
5847492537
*) next step of restructuring for new crawlers
...
- IndexCreate_p.java: correcting problems with ftp urls
- URL.java does not cutout the userinfo anymore
(needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
- plasmaCrawlLoader.java:
-- hack to re enable https urls
-- adding function getSupportedProtocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 13:17:11 +00:00
orbiter
6cce47e217
test of ftp-urls in URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 13:10:40 +00:00
orbiter
f933f00f09
another patch to URL protocol handling for 'news', 'nntp' etc:
...
reject it! (the java.net.URL class rejects them too)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 01:04:04 +00:00
orbiter
4c6e00d80a
more bugfixes for URL class, see:
...
http://www.yacy-forum.de/viewtopic.php?p=24844#24844
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 00:23:39 +00:00
orbiter
b7dc251948
fixed bugs in url class:
...
- correct backpath ('..') handling
- correct absolute path handling
- included https
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 22:27:01 +00:00
orbiter
276225d79e
fix for URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 21:33:00 +00:00
orbiter
f43c90fa98
fixed handling of null referer in crawlOrder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 21:46:34 +00:00
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
theli
0db237467f
*) bugfix for URL generation from file
...
see: http://www.yacy-forum.de/viewtopic.php?p=24116
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-25 16:18:45 +00:00
orbiter
e20ff77c10
another bugfix in new url class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:37:22 +00:00
orbiter
685430a1b5
bugfix in new URL class, better loggin for domain extraction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:33:01 +00:00
orbiter
79af283f6c
better debugging in new URL class for wrong port numbers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 10:21:24 +00:00
orbiter
4bd626572b
added hashCode and compareTo to new URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 12:00:54 +00:00
theli
a70cbd959b
*) further improvements for the anomic.net.url class
...
- relpath starting with javascript: are ignored now
- bugfix for concatenation of relpath starting with # or ?
in this case no slash should be added to the baseURL, otherwise
we get URLs of the form http://test.de/index.html/?param=value
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 05:12:08 +00:00
theli
8a1f1d96b3
*) Bugfix for url concatenation. Relative urls with / or http:// at the beginning
...
were not handled correctly on url concatenation via new URL(URL,relPath).
See: http://www.yacy-forum.de/viewtopic.php?t=2623
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 04:48:18 +00:00
orbiter
3879a0ecd0
replaced java.net.URL usage by use of new class de.anomic.net.URL
...
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00