Commit Graph

247 Commits

Author SHA1 Message Date
theli
2cb084d426 *) Complete Index Transfer
See: http://www.yacy-forum.de/viewtopic.php?p=9622

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 10:37:16 +00:00
theli
d1de71e9f6 *) Suppress stacktrace on proxy error for "No route to host Exception"
See: http://www.yacy-forum.de/viewtopic.php?t=1153

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 20:21:38 +00:00
theli
56160cbd01 *) Bugfix for "YaCy verzählt sich ..." Bug.
See: http://www.yacy-forum.de/viewtopic.php?p=9559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 05:26:01 +00:00
orbiter
43b42854a0 fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 03:54:52 +00:00
theli
3587407039 *) Fixing problems of list operation if index and queue size are both 0.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 22:27:48 +00:00
theli
51b48a10e8 *) Suppress stacktrace on proxy error for "ValidatorException: No trusted certificate found"
See: http://www.yacy-forum.de/viewtopic.php?t=1110

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 20:37:01 +00:00
theli
7fe8784231 *) URLs pointing to a server having a private ip addess will not be indexed anymore
See: http://www.yacy-forum.de/viewtopic.php?p=9408

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 21:38:03 +00:00
theli
0aafb83edc *) Bugfix for robots.txt isDisallowed Check.
Setting path to "/" if it is null or empty.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 13:18:34 +00:00
borg-0300
8260128ee9 changed getFreeSize();
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 11:22:41 +00:00
theli
f8ad65eae1 *) First trial implementation of robots.txt support
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 11:17:21 +00:00
borg-0300
0a57fbcde5 Added new HashSet filesInUse;
Added new Function getFreeSize();

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 09:37:00 +00:00
borg-0300
8cd6a52dd0 Convention
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 07:26:19 +00:00
borg-0300
c0e3d18bbf *) remove import java.lang
*) Added Super()
*) replaced startsWith()
*) cleaned


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 16:58:12 +00:00
borg-0300
b1cd1fa917 cleaned
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 14:56:19 +00:00
borg-0300
da9c6857fb *) changed a misunderstand, no BUG ;)
*) finals and other

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 14:17:53 +00:00
borg-0300
fbac053c03 small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 11:23:48 +00:00
theli
578f36ae18 *) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
into the sb-queue anymore if the mimeType or fileExtension is not supported
   by the installed parsers.
- Advantage: Avoiding unnecessary enqueueing and dequeueing from queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 11:17:37 +00:00
theli
1219ef99f0 *) Bugfix for NullpointerException in yacyDebugMode Init
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:51:15 +00:00
theli
6c722706b7 *) Moving yacyDebugMode intialization to switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:34:34 +00:00
theli
4e07828807 *) httpdProxyHandler.java
- harmonizing proxy exception handling
- adding malformed URL + blacklist check for http head method
- adding malformed URL check to http post method
- chunked encoding is now not used anymore for http post if clients
  are http/0.9 or http/1.0 clients (same behaviour as already implemented for get)
- now an exception will be thrown on internal httpc errors to force an error output
  to the client or a connection close. This should help to fix the "binary data in browser window" bug

*) plasmaSwitchboard.java
- fixing the following Bug
  E 2005/09/03 18:02:42 PLASMA Could not index URL http://mis04.de/FAIL/snot.php: null
  java.lang.NullPointerException
	at de.anomic.plasma.plasmaSwitchboard.processResourceStack(plasmaSwitchboard.java:1000)
	at de.anomic.plasma.plasmaSwitchboard.deQueue(plasmaSwitchboard.java:625)
	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at de.anomic.server.serverInstantThread.job(serverInstantThread.java:95)
	at de.anomic.server.serverAbstractThread.run(serverAbstractThread.java:243)
  This bug could occure if the cached responseHeader is null
- getting the mimeType now from the parsed document instead of the responseHeader because the 
  mimeType could have been changed during content parsing (e.g. because of the mimetypeParser)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@656 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:10:00 +00:00
borg-0300
81cb8feb15 back to 649 :/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-04 22:03:44 +00:00
borg-0300
5194511e8e *) attempt to find bug
See: http://www.yacy-forum.de/viewtopic.php?t=1121

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-04 19:08:51 +00:00
theli
6991b9e2b9 *) Suppress stacktrace on crawler error for "Connection reset"
See: http://www.yacy-forum.de/viewtopic.php?p=9071

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@645 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-03 15:17:19 +00:00
theli
a47f9238fe *) Blacklist is now also used by the crawler
See: http://www.yacy-forum.de/viewtopic.php?t=1069

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 12:09:45 +00:00
theli
dc0a2d4c11 *) Bugfix for Loader Queue:
Job count was not displayed correctly
*) IndexingQueue:
- now it's possible to delete single entries from the queue
- now it's possible to clear the whole queue
  See: http://www.yacy-forum.de/viewtopic.php?t=995

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 11:40:40 +00:00
theli
732a107160 *) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
- Logging message for "urlEntry.url() == null" is now displayed as info
   - IndexCreateWWWLocalQueue_p.html now detects null entries while looping throug the list and removes them automatically
   See: 
   - http://www.yacy-forum.de/viewtopic.php?t=532#8781
   - http://www.yacy-forum.de/viewtopic.php?t=639
   - http://www.yacy-forum.de/viewtopic.php?t=1071
   - http://www.yacy-forum.de/viewtopic.php?t=338
   - http://www.yacy-forum.de/viewtopic.php?t=980

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 09:33:05 +00:00
theli
33aaffbfc6 *) Displaying content size of each entry in indexing queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 08:22:11 +00:00
borg-0300
7626823519 BUGFIX for last 'commit'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 23:43:27 +00:00
borg-0300
971756e8dd the delete size is smaller
See: http://www.yacy-forum.de/viewtopic.php?t=1084

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 23:35:00 +00:00
theli
0471019606 *) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 22:05:20 +00:00
borg-0300
cc493ef8c1 Added change from Hermes
See: http://www.yacy-forum.de/viewtopic.php?t=1050

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 11:18:41 +00:00
theli
bead8a32aa *) IndexCreate_p.java:
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
   Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java 
   instead of an iterator to display the indexing-list. 
   Advantages: avoid concurrent modifications of the list while displaying it. 
               Speedup because now we have to access only one sync function instead of multiple ones 
               (one for each entry)
*) IndexCreateIndexingQueue_p.java
   Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
   If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of 
   the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
   used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
   Now it's possible to delete single entries of the local crawler queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 07:52:46 +00:00
theli
48aaf703cc *) Adding additional logging output to detect crawling problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@625 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 06:55:21 +00:00
theli
59b8a98c7e *) Bugfix for suppressing of stacktrace in log on crawler error "MalformedURLException"
See: http://www.yacy-forum.de/viewtopic.php?p=8840

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 06:31:30 +00:00
borg-0300
c1d7527929 better cache cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-31 13:07:08 +00:00
theli
2e6df95786 *) adding toString method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-31 10:43:03 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
theli
f19c09b227 *) Suppress stacktrace on crawler error for "MalformedURLException"
See: http://www.yacy-forum.de/viewtopic.php?p=8733#8733

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 20:25:07 +00:00
theli
cc1df08069 *) Adding missing synchronized blocks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@608 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 14:57:32 +00:00
borg-0300
bf14e6def5 *) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
- path now are absolute
*) move path check from plasmaHTCache to plasmaSwitchboard
   - only one path check when starting
*) small other

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 12:50:30 +00:00
theli
9b818b1ce3 *) Pausing Crawlers if there is not enough space on disk
See: http://www.yacy-forum.de/viewtopic.php?p=8648

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 09:43:27 +00:00
theli
b33094e925 *) Trying to solve "Too many open files bug"
*) Temp.Bugfix for "Bug in Index Restore"
   See: http://www.yacy-forum.de/viewtopic.php?p=8647#8647
   Orbiter: Please take a look



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 09:07:42 +00:00
theli
34790acf02 *) Bugfix for suppressing of stacktrace in log on crawler error "unknown host"
See: http://www.yacy-forum.de/viewtopic.php?p=8615#8615

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 06:24:23 +00:00
theli
af7b8f75bd *) Making proxyAccessLogging configureable via yacy.logging file
- logging can be disabled now
   - logging directory / filelimit / rotation count can be configured now
   See: http://www.yacy-forum.de/viewtopic.php?t=965&postdays=0&postorder=asc&start=30#8280

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:31:58 +00:00
theli
2a081c9ee5 *) Adding additional logging message for "NURL.entry() == null" Bug
See: http://www.yacy-forum.de/viewtopic.php?p=8446

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-28 05:39:26 +00:00
theli
cb1f11c96b *) Suppress stacktrace on crawler error for "Unknown Host"
See: http://www.yacy-forum.de/viewtopic.php?p=8431

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@590 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-28 05:08:26 +00:00
theli
e338a13de3 *) Suppress stacktrace on crawler error for "Read timed out"
See: http://www.yacy-forum.de/viewtopic.php?p=8433

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 18:48:02 +00:00
theli
2e43e744de *) Suppress stacktrace on crawler error for "connect timed out"
See: http://www.yacy-forum.de/viewtopic.php?p=8420 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 04:53:25 +00:00
theli
36cbe04e3e *) Bugfix for Crawler Redirection Bug
See: http://www.yacy-forum.de/viewtopic.php?p=8422

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 04:36:13 +00:00