Commit Graph

4310 Commits

Author SHA1 Message Date
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
c45117f81f fixed dates in metadata
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-08 22:09:36 +00:00
orbiter
a7d038bb7a The oai ListFriends source list becomes configurable: just write them into defaults/oaiListFriendsSource.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 10:01:37 +00:00
orbiter
7ab207d93a better presentation of search result metadata and fixes to htcache loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-04 20:57:09 +00:00
orbiter
5fbf866cae - fixed resumption token generation for oai-pmh import
- relaxed dublin core parsing: the dc:reference tag may replace dc:identifier if this does not contain a valid url
- parsing of completeRecords number and presentation in the download list of oai import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-02 22:20:24 +00:00
orbiter
fc5efcc05a enhanced and fixed OAI-PMH import
- now importing OAI-PMH server list fron two sources
- simultanous import from several servers (even > 2000)
- check buttons on OAI-PMH server list to select multiple servers for import start
- it is possible to select all servers at once for import
- imported XML data is gzipped after import from surrogate reader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-30 14:03:51 +00:00
sixcooler
c2098f9399 close unused connections if there to many for DHT
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-29 23:38:50 +00:00
orbiter
40a8d132d9 tried to fix 100% CPU when calling Balancer.top()
see also: http://forum.yacy-websuche.de/viewtopic.php?p=19978#p19978

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 22:37:50 +00:00
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
3aad50d38e :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-26 15:26:08 +00:00
orbiter
9edd38fbc5 connectionCount limit too low?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-26 15:24:47 +00:00
orbiter
7a05db0fcb fixed to prevent that too many open connections exist
- create less connections at maximum (smaller httpc connection pool size)
- create less connections per host (2, standard required by RFC)
- do not start DHT distributions if there are too many open connections
- clear open/idle connections earlier; run cleaner more often

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 23:08:36 +00:00
orbiter
a9b9bf667b fix for http://forum.yacy-websuche.de/viewtopic.php?p=19910#p19910
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:48:30 +00:00
orbiter
b18a7606a0 some performance hacks and fixed after reading dump in
http://forum.yacy-websuche.de/viewtopic.php?p=19920#p19920

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:37:36 +00:00
orbiter
2bc3cba6f1 - fix for 'do not write to cache' rule.
- do not read from cache if byte[] array is still filled from response object (will do less IO)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-24 08:22:45 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00
orbiter
bfa35d6d20 possible fix for ZURL.list counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 08:46:47 +00:00
orbiter
65f383e70b some adjustments to the httpc after testing with a very slow httpd
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-22 22:10:19 +00:00
orbiter
8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 11:29:27 +00:00
sixcooler
13f5b8e7ba fix for storing/getting bookmark-folders
called by Quix0r

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6825 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 17:55:59 +00:00
orbiter
7b69d79727 enhanced remove() operation: in many cases it is not necessary to return the removed object to the called.
for such cases the delete() operation was introduced which is sometimes much cheaper in operation since it does not need to create objects to hold the removed content and it does not need to read those objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 14:47:41 +00:00
orbiter
93ea0a4789 enhanced remove operation in search consequences (which are triggered when the snippet fetch proves that the word has disappeared from the page that was stored in the index)
- no direct deletion of referenced during search (shifted to time after search)
- bundling of all deletions for the references of a single word into one remove operation
- enhanced remove operation by caring that the collection is stored sorted (experimental)
- more String -> byte[] transition for search word lists
- clean up of unused code
- enhanced memory allocation of RowSet Objects (will use a little bit less memory which was wasted before)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 13:45:22 +00:00
orbiter
64f29f990e a collection of performance hacks and code cleanup:
- removed usage of URL-Caches which could have been a memory leak
- removed unused classes and methods
- removed not necessary synchronizations
- added synchronization hacks where possible
- fine-tuned crawling speed to prevent IO of balancer
- fixed a bug in IODispatcher that may have caused that no merges were done
- reduced number of parameters in very often called methods (compare methods)
- reduced complexity of data structures of now massively used HandleSet class
- reduction of new String() and getBytes() usage / new methods to support this transition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-19 16:42:37 +00:00
orbiter
8b8107b2a3 reduced IO-load and synchronization/blocking
- enhanced the Balancer performance when building new domain stacks using a new Table buffer
- added the new Table buffer BufferedObjectIndex class
- changed order of access to LURL-read (prefereing segment over Crawl Queues) will reduced blocking time on balancer
- fixed PPM setting in Crawler_p servlet (had doubled values)
- reduced synchronization in IndexCell because it is not necessary: reduced blocking during indexing/merging/dumping
- removed did-you-mean cache in IndexCell because that caused too much overhead and more memory usage but was not very useful. This reduced also deadlocks that could be causes when searched are performed during indexing.




git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-18 21:55:20 +00:00
orbiter
3a50b5aa04 enhanced object hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 14:19:29 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
dde394a977 - shifted some computation out of synchronization to allow more concurrency
- removed synchronization where not necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 23:22:06 +00:00
orbiter
f204076d25 removed usage of temporary files: causes too much IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 22:17:18 +00:00
orbiter
48b9371735 changed balancer re-load counter. causes less blocking here doing intranet indexing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 20:57:26 +00:00
orbiter
0d04ab1422 new access tracking data type strategy; previous data types may have caused deadlocks of httpd when performing statistic cleanups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 16:18:04 +00:00
low012
fc43f3028e *) hopefully fixing NPE issue introduced in r6797
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:33:50 +00:00
orbiter
55d8e686ea performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:29:55 +00:00
orbiter
2e26744f4e more concurrency when normalizing RWI entries + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:47:57 +00:00
orbiter
555b333041 fix for wrong count of server processes. may fix non-access problems in some cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:34:16 +00:00
orbiter
4917f96729 fixes for some changes in SVN 6797 that caused NPEs when the bookmarks initialized
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 10:14:08 +00:00
low012
dff660441a *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:31:16 +00:00
low012
15d9ea8375 *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:25:15 +00:00
low012
2bc459252e *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:16:09 +00:00
orbiter
67ec58d8e7 search performance enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-12 07:31:43 +00:00
hermens
4ec0092677 more null == proxy fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 18:31:12 +00:00
hermens
2f90f0ad56 Remove asserts blocking proxy use cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 15:12:39 +00:00
sixcooler
eb2a4bb555 workaround(?) for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770&start=0&st=0&sk=t&sd=a&hilit=DefaultCharsetStringPart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 00:21:07 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
low012
b97ad0f380 *) some minor changes for better code readability
*) added more SVN properties

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-05 12:37:33 +00:00
orbiter
ba51d140e1 added more info in assert in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6782 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-28 22:59:19 +00:00
orbiter
a85c5bb8a7 added support for multiple (fail-over) network definition locations when http-locations are given. multiple locations can be given with a comma-separated list of urls pointing to the network definition file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 23:15:15 +00:00
orbiter
9b3840cb66 performance hacks for the template engine + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 22:52:48 +00:00
orbiter
5c10f8bc5f enhanced latest hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 07:19:49 +00:00
orbiter
b3238bec83 performance hack for httpd
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 07:09:55 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
749ffbd642 - added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception.
- reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:46:40 +00:00
orbiter
9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
e12f1fd821 - added setting of access rights for executable scripts after auto-installation
The correct access right was missing expecially for bin/apicall.sh

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6769 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 09:51:01 +00:00
orbiter
95f31da8da increase dump cache queue length from 1 to 2
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-24 20:36:35 +00:00
orbiter
6c093d6aed - enhanced domain navigator computation
- fixed domain navigator content in case that a mustmatch constraint was given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
90dd197ae7 - no latency for local crawls
- catch interrupted exception during 'fast' crawls in workflow processor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:18 +00:00
orbiter
bfb518cd47 some refactoring to get the LoaderDispatcher a little bit more independent from the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:28:03 +00:00
orbiter
c855fc48c6 only load robots.txt for http and http protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:15:11 +00:00
orbiter
748abfcffa added patches to prevent yacy-protocol DoS settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 15:31:15 +00:00
orbiter
e820ed061a avoiding excessive DNS lookups to determine localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
de88200e11 - added Byte Order Mark recognition to serverObjects
The BOM character FEFF may appear at the beginning of strings if some browsers append the characters %EF%BB%BF to input values.
see http://en.wikipedia.org/wiki/Byte_order_mark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6748 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 10:58:40 +00:00
orbiter
3300930fc5 - (almost) fixed FTP crawler
- integrated/fixed SMB crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
9623d9e6d2 added a smb loader component for the YaCy crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
48995e71c4 added soft-auth to general authentication scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 00:07:17 +00:00
orbiter
72f00dee59 removed never-used server access account function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
57e1eae95e longer time-out for url fetching .. may help to show all that links that the statistic say for a search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6727 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 22:23:08 +00:00
orbiter
9e639603e3 after frequent occurrences of 100% CPU usages and permanent blockings I try to disable a function in a method that may cause the problem when calling an external library (apache http client 3.x). The thread dump that shows the problem is attached here.
at java.lang.StringCoding.encode(StringCoding.java:266)
	at java.lang.String.getBytes(String.java:946)
	at org.apache.commons.httpclient.util.EncodingUtil.getAsciiBytes(EncodingUtil.java:237)
	at org.apache.commons.httpclient.methods.multipart.Part.sendDispositionHeader(Part.java:220)
	at org.apache.commons.httpclient.methods.multipart.Part.send(Part.java:308)
	at org.apache.commons.httpclient.methods.multipart.Part.sendParts(Part.java:385)
	at org.apache.commons.httpclient.methods.multipart.MultipartRequestEntity.writeRequest(MultipartRequestEntity.java:164)
	at de.anomic.http.client.Client.zipRequest(Client.java:364)
	at de.anomic.http.client.Client.POST(Client.java:339)
	at de.anomic.yacy.yacyClient.wput(yacyClient.java:285)
	at de.anomic.yacy.yacyClient.transferURL(yacyClient.java:1053)
	at de.anomic.yacy.yacyClient.transferIndex(yacyClient.java:942)
	at de.anomic.yacy.dht.Transmission$Chunk.transmit(Transmission.java:200)
	at de.anomic.yacy.dht.Dispatcher.storeDocumentIndex(Dispatcher.java:397)
	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:103)
	at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:66)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:637)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:19:23 +00:00
orbiter
4144927d94 show less errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:02:08 +00:00
orbiter
b88f5fbb4b slightly changed crawling policy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:46:08 +00:00
orbiter
7684a575c4 fix for deletion of error database each time when YaCy starts up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 00:33:39 +00:00
orbiter
f561e340c6 show more results of single domains when not authorized fully (up to 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 00:12:58 +00:00
orbiter
c4bdb1e7f2 added one more option in ViewFile to show an iframe like for the orginal web page content but using the cache than the direct link to the content in the web. Upgraded the very old and previously not any more used CacheResource_p servlet to a new and working version.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 23:41:51 +00:00
orbiter
c09a995930 better logging of double occurrences of urls in the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 20:31:30 +00:00
orbiter
884b262130 - added a new Wiki Namespace Navigator
- some redesign of Navigator data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:25:49 +00:00
orbiter
617dfbbd06 allo 'authorization by encoded password' also if requesting client is not from localhost but from the same host as yacy is running on.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 16:03:55 +00:00
orbiter
599c3766c4 added authentication to automated API call
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 14:10:03 +00:00
orbiter
727dd9b193 - fixed a bug in robots.txt parser
- moved storage of robots.txt entries to WorkTables, so it is now possible to browse the robots entries with the table browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 11:58:07 +00:00
orbiter
54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
- added evaluation and indexing prevention in case that a noindexing is given in a html file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-03 23:32:56 +00:00
orbiter
46c4f8b68a better look-ahead into the crawl queue: show more on crawl monitor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 23:11:58 +00:00
lotus
7b546415dc added svn6695 for windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 14:58:53 +00:00
orbiter
f175f9a2d3 changed way how number of search requests are counted:
so far only search requests at the remote search interface had been counted.
This was done to protect the privacy of searchers, because counting was not done and published at the own search interface.
This caused that no search requests of robinson peers had been counted, becuase they cannot be counted at remote peer.
This change introduces a distinction of locally done search requests at the local search interface from search requests that are on the local interface but had been submitted from a remote IP without authentication.
Now 3 counters are maintained:
- partial count of remote searches
- total count of local searches on robinson peers from non-authenticated clients
- total count of local searches on robinson peers from localhost or authenticated clients
In the global statistic of search requests now the first two counters of the three cases are added
Because we habe a large number of robinson peers with a large number of remote non-authenticated requests the statistic should show at least three times of the number of search requests.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 13:53:55 +00:00
orbiter
84222e3b4f fix for auto-updater: delete old libraries before copy of new one
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 13:46:50 +00:00
orbiter
93b7ddc27d fix for http://forum.yacy-websuche.de/viewtopic.php?p=19376#p19376
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-21 22:49:35 +00:00
orbiter
8030ed3319 self-healing for lost crawl profile handles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-18 21:55:45 +00:00
orbiter
e3e5e05ec2 fix for problem in ranking setting which was caused by the introduction of a toString() method in serverObjects
see also: http://forum.yacy-websuche.de/viewtopic.php?p=19310#p19310

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-17 21:31:08 +00:00
orbiter
e3ccfb54aa fix for display problem in Firefox on MacOS X
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-17 09:08:16 +00:00
orbiter
564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-16 23:06:04 +00:00
orbiter
ef62d017e5 integrated session id filtering for crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:15:17 +00:00
orbiter
d8d9984913 added framework for session id filtering (not ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:30:41 +00:00
orbiter
2bc36de336 - fix for bug in svn 6669
- cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:06:13 +00:00
orbiter
d378ca4604 better handling of concurrency in seed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:57:35 +00:00
orbiter
6538043d89 fix for http://forum.yacy-websuche.de/viewtopic.php?p=19189#p19189
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:45:31 +00:00
lotus
945e0ba5a5 allow global search if res. observer disabled index transmission
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 17:14:16 +00:00
lotus
8faeedd99a not a fix! for:
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2679

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 09:33:30 +00:00
lotus
11188cd7eb resource observer now uses the Java 6 method to check for free space. thus, disk observing now needs Java 6 installed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-06 18:48:06 +00:00
orbiter
be18b5d8cd fix for 'cannot switch back to default language'-bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 23:53:02 +00:00
orbiter
74e736c903 missing file for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6645 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 14:52:58 +00:00
orbiter
308a973503 refactoring of tables data organisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 11:26:23 +00:00
orbiter
8a76f38d26 Added a new steering servlet that can be used to repeat actions that had been made on the yacy interface. This can be used to:
- start again a previously started crawl
- submit settings (again). This option will be used to transmit
  all settings of one peer to another peer if the remote-peer
  steering function is ready
This steering framework will also be used for a 'schedule-everything'
which will also include a new scheduler for crawling.
  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-03 09:31:12 +00:00
orbiter
840527689b more simplification of bookmark class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 23:04:52 +00:00
orbiter
d77782a8d5 removed bookmark tags file, tags are now stored only in RAM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6638 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 22:44:59 +00:00
orbiter
ada0ce9de3 refactoring of bookmarks: there is a big performance problem in the bookmarks code and furthermore the bookmarks
will loose its leading role for the re-crawl funtion when the new api tables will work. To be prepared for a replacement
of such functions the bookmark class is re-organised.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 22:18:56 +00:00
orbiter
a131ebbcb5 one more fix for NPE, see
http://forum.yacy-websuche.de/viewtopic.php?p=19010#p19010

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 11:28:37 +00:00
orbiter
24060885b6 - added Tables abstraction in data.Tables.java
fix for
http://forum.yacy-websuche.de/viewtopic.php?p=18910#p18910
http://forum.yacy-websuche.de/viewtopic.php?p=18894#p18894
http://forum.yacy-websuche.de/viewtopic.php?p=18814#p18814


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 18:02:09 +00:00
orbiter
7fdf59a77f misc NPE check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 15:59:24 +00:00
orbiter
a512aef6ad fix for http://forum.yacy-websuche.de/viewtopic.php?p=18918#p18918
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 10:33:20 +00:00
orbiter
3889438db6 fix for bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 14:20:24 +00:00
orbiter
23bcca07a3 removed directly linked servlets that had been there to test memory failures that appeared in that servlets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 13:30:20 +00:00
orbiter
69c29acb6e no exception thread dump if parser cannot parse becuase that mime-type/extension is in the deny-set
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6611 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 13:21:37 +00:00
orbiter
8ce936bcdd added an api recording function: it shall be possible to record
all operations on YaCy in a database that should make it possible
1) to re-create a setting on fresh peers
2) to transmit a setting from one peer to another
3) to re-create crawl starts after a complete deletion of the index
This functionality will also support
4) scheduled re-crawls (new implementation)
To implement this, a new database structure has been crated that stores maps into blob heaps. to encode maps the b-encoding technique was used (this is the same encoding that torrent files use)
- added a b-encoder
- enhanced the b-decoder
- added a b-encoded map heap data structure
- added a table organisation based on b-encoded heaps
- added a servlet to maintain such tables (see Tables_p.html)
- integrated the servlet into the Advanced Settings menu
- added an api recording based on the new tables

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-21 22:06:03 +00:00
orbiter
e80e060ca6 - increased thread priority for server threads
- decreased thread priority for crawler threads

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-19 11:29:22 +00:00
orbiter
234f733a3d - relocation of seed db is better for network switch than re-initialization because of the embedding of the peers object in other objects
- small refactoring of blacklist interface code to remove PMD warnings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-18 00:07:20 +00:00
orbiter
473b11033d fixed network switch process - crawling did not work after a switch before this fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6592 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-17 23:33:15 +00:00
orbiter
fd7b348973 some fixes for the network switch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-17 22:07:08 +00:00
orbiter
f6731c6240 more logging etc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-17 00:41:50 +00:00
orbiter
d9169cc6c3 increased proxy load time-out from 30000 to 60000 milliseconds
according to http://forum.yacy-websuche.de/viewtopic.php?p=17782#p17782

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-14 10:42:05 +00:00
orbiter
938e806182 tried to fix date problem that may have prevented that foreign peers stay in the network
- removed unused code
- removed possibly wrong utc difference correction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6581 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 20:01:46 +00:00
orbiter
bd05e57d3b fix for http://forum.yacy-websuche.de/viewtopic.php?p=18563#p18563
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6580 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 18:41:33 +00:00
orbiter
5df628a2a4 - added BEncoder class
- added BEncodedHeap class that encodes B data structures and stores that to a heap
- refactoring of MapView, this is now named MapHeap to fit into the naming scheme of the BEncodedHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 16:21:37 +00:00
orbiter
82f57f79e5 more PMD enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 00:23:07 +00:00
orbiter
5d930c96f0 more fixes to search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 00:04:37 +00:00
orbiter
8c520f128d reverted a change in ranking process committed this afternoon
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 20:56:37 +00:00
orbiter
a06f7ddb33 more PMD recommendations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 20:53:19 +00:00
orbiter
18172451a0 better search computation:
- increased sort limit, now 3000 entries, before: 1000
  this should cause that more results can be shown in case
  of strong limitating constraints, like domain navigation
- enhanced the sort process
- check against domain navigator bugs
- fix in sort stack
- showing now all naviagtion pages at first search (not only next page)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6569 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 15:01:44 +00:00
orbiter
d126d6c1b5 renamed the servlet WatchCrawler_p to Crawler_p
this was done because that servlet may be used for wget/cronjob
triggered crawl starts and it appears to be confusing that the
name of the crawl start servlet looks like a pure monitoring tool.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6568 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 10:05:28 +00:00
orbiter
66c0a8e849 more PMD recommendations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 22:18:38 +00:00
orbiter
909a4f91c7 added a logging output for crawl starts that shows the URL that can be used to start the crawl again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 18:10:39 +00:00
orbiter
bc96d74813 - clean-up of robots.txt parser
- added 'yacybot' as key to recognize robots.txt entries for YaCy
- removed unused method to get robots.txt from database

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 16:36:30 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
lotus
eac2daf2e8 * reenable DHT if yet enough memory is available
* reset treshold on reconfiguratoin
(thanks to sixcooler)

* display status message in web interface

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6562 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 19:04:43 +00:00
orbiter
d77a8f3b3e added some modifications recommended by PMD for better performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6560 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 01:40:26 +00:00
orbiter
d1973bae2a code cleanup: removed unused code and unused methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6559 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 00:42:49 +00:00
orbiter
a3b8b7b5c5 some redesign of the main menu structure:
- moved all index generation servlets to it's own main menu item, including proxy indexing
- removed external index import because this operation is not recommended any more. Joining an index can simply be done by moving the index files from one peer to the other peer; they will be merged automatically
- fix to prevent endless loops when disconnecting http sessions
- fix to prevent application of bad blacklist entries that can cause a 'Dangling meta character' exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 00:10:43 +00:00
lotus
ab3cf60dbe fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6557 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-09 14:10:27 +00:00
orbiter
7f20963b41 add-on to last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-09 00:17:39 +00:00
orbiter
eeca2ded92 fix for http://forum.yacy-websuche.de/viewtopic.php?p=18500#p18500
- catch uncatched OOM
- less wasting of memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-09 00:08:16 +00:00
orbiter
bb2e03761c - fix for deadlock with 100% CPU during search
- fix for failure of ranking because of a ConcurrentModificationException

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6553 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 12:41:43 +00:00
orbiter
dff4f95c78 some patches to get the torrent parser working
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 00:42:12 +00:00
low012
82198acc06 *) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6537 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-28 11:06:49 +00:00
low012
b75547fc60 *) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-28 11:05:50 +00:00
orbiter
57d729e377 fix for negative numbers in network statistic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6532 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-21 11:36:48 +00:00
orbiter
4ac4fe952c patch for npe in bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6530 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-18 10:14:05 +00:00
orbiter
d548bd41ad fix for a npe during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6528 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-15 12:15:39 +00:00
orbiter
37245430c3 fix for NPE during DHT RWI selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6527 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-15 00:02:10 +00:00
orbiter
a37878b7d5 url parser regex performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-10 14:40:32 +00:00
orbiter
b527d2ebfa fix for media search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6522 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-09 23:47:37 +00:00
orbiter
362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6521 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-09 23:27:26 +00:00
orbiter
8281e29963 - more configuration for profiling graph (number of events)
- more logging for a shutdown: print reason and accessing IP into log


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-08 14:25:51 +00:00
f1ori
5f0f6b71b4 * revert last commit, something is more broken than before
* UTC timestamps and lastseen-properteries still needs some debugging


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-06 21:54:32 +00:00
f1ori
8c8b642eba * fix timezone problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6518 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-06 21:03:12 +00:00
orbiter
4782d2c438 fix for search bug that appeared when looking at page 3 of results or further
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6515 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-03 12:25:03 +00:00
orbiter
29fde9ed49 better control of ranking order in sort stack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-03 00:36:07 +00:00
orbiter
66923ebc6c - modified method in RequestHeader that delivers the host name of requester: no more reverse domain lookup (may have killed interface performance in some cases)
- added logging output for shutdown servlet: show ip of requester of the shutdown

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 22:56:28 +00:00
orbiter
e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 14:01:19 +00:00
orbiter
4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 00:37:59 +00:00
orbiter
f4946eaf27 - better thread dump
- suppressed one server exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-01 22:53:36 +00:00
orbiter
9743b70d1c disabled keep-alive of server, not really needed for speed but a cause for much trouble and memory occupancy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6508 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-01 19:14:16 +00:00
orbiter
491ba6a1ba - some refactoring in workflow
- some refactoring in search process
- fixed image search for json and rss output
- search navigation on bottom of search result page in cases where there are more than 6 results on page
- fixes for number of displayed documents
- disabled pseudostemming

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6504 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-24 11:13:11 +00:00
orbiter
969123385b added json and rss output for image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-23 16:10:50 +00:00
orbiter
d183f8d980 refactoring (moved code from ContentTransformer to TemplateEngine)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 14:57:00 +00:00
orbiter
23aef43786 - better synchronization in SortStack
- better ThreadGroup organization
- less worker threads for media search (64 was too much...)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 14:35:33 +00:00
orbiter
7b1f5b0430 - better media search ranking
- better concurrency with enhanced synchronization in sort stack

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 13:19:12 +00:00
orbiter
4df88a4e7a - fixes for missing or bad hashCode computation
- fixes for bad equals() methods that had not been used by hash maps and therefore some classes did not work as objects in hash maps.
- this may also affect some cases where double-checks should have been, but did not work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6495 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 12:11:56 +00:00
orbiter
dbdf2570ba added comparator and more fixes for SortStack/SortStore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 03:30:48 +00:00
orbiter
1dff620181 Better implementation of SortStack and SortStore and adoptions in all using classes to implement the necessary Comparable interface and hash code computation.
The better SortStack performance affects crawling and image search speed and quality.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-19 13:49:28 +00:00
orbiter
fe41a84330 some enhancements in web caching: avoid double loading of response metadata and/or content
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6491 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-19 10:17:26 +00:00
orbiter
06d0dcde20 more enhancements to image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6490 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-19 00:43:42 +00:00
orbiter
4c6312d103 enhanced image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-18 23:56:05 +00:00
orbiter
2d8f3ee301 some performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-18 16:03:28 +00:00
orbiter
94b2a664f3 - use a static DiskFileItemFactory (one instantiation is enough)
- use more memory for the DiskFileItemFactory to avoid IO when POST commands come

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-18 15:05:51 +00:00
orbiter
013f337d3f - avoid unnecessary host name lookups for localhost
- avoid unnecessary reverse domain name lookups for remote access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-16 23:00:54 +00:00
orbiter
20c5d78a5c fix for a ConcurrentModificationException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-11 23:31:12 +00:00
orbiter
7144d2df6e added crawlReceipt servlet as individual class to examine OOM problem as documented in
http://forum.yacy-websuche.de/viewtopic.php?p=18120#p18120

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-11 16:12:00 +00:00
orbiter
29fe436e36 - fixed post-ranking including prefer mask
- enhanced a core database access method / less wasted ram

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-09 19:14:51 +00:00
orbiter
5399d1e2bc refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6471 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-08 22:58:57 +00:00
orbiter
1fa0ac26e9 better protection against NPEs during search/ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-07 10:58:33 +00:00
orbiter
4c99d4683d possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 23:15:20 +00:00
orbiter
18b21eaffe small fixes to search default values and server logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 19:13:35 +00:00
lotus
6edc168cfe option to disable dht by memory limit:
memory.acceptDHT in kbytes
not yet pre-enabled, will clear on every startup
please review since this could break dht in freeworld

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 19:13:30 +00:00
orbiter
4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 20:28:37 +00:00
orbiter
e3025ee691 - new icon for OAI-PMH loading action
- added many stack trace outputs for exceptions in crawl profile handler to find the 'missing profile handle' bug
- catched one more timeout exception in httpd file loader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 16:40:15 +00:00
orbiter
f0b8db93f0 - more abstraction of serverCore thread access
- no more keep-alive when number of connections exceeds 1/2 of the allowed number of connection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 14:54:43 +00:00
orbiter
2889b9426e missing code for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 12:03:19 +00:00
orbiter
b6a8887ff5 better handling of running sessions without explicit hashtable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:59:15 +00:00
orbiter
1dc7ea986a added a dynamic keep-alive time-out for http server sessions:
if there are many concurrent server sessions, the timout is decreased.
This should avoid a situation where the clean-up thread is too
late to stop running http sessions that should be terminated
before the maximum number of server sessions is reached.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:01:09 +00:00
orbiter
b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
- added monitoring for retrieved records

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6444 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-02 11:53:14 +00:00
lotus
79251e6f60 configurable disk space hardlimit for dht
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-31 19:12:53 +00:00
orbiter
a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-31 11:58:06 +00:00
orbiter
30f108f97d added stub of oai-pmh importer (not working yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-30 15:58:04 +00:00
orbiter
77c99e500f added more control over memory allocation
should avoid some of the OOMs

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-27 15:25:48 +00:00
orbiter
52470d0de4 - fix for xls parser
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-22 22:38:04 +00:00
orbiter
5e8038ac4d - refactoring of blacklists
- refactoring of event origin encoding


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 20:14:30 +00:00
orbiter
26fafd85a5 - more refactoring
- fixed problem with parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 15:12:34 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
a8ce192f63 - shifted main classes to new package net.yacy
- fixed some bugs in last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 01:38:07 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
hermens
0fd9540866 Configuration of HTTPDProxyHandler logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-17 14:04:18 +00:00
orbiter
cee7a05ff2 - de-serialized the pdf parser
- added fail callback for file indexer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 10:47:29 +00:00
orbiter
9db928ce53 replaced fontbox 0.7.3 with fontbox 0.8.0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 09:51:16 +00:00
orbiter
c2272785c7 - fix for xlsx and pptx parsing
- less exception logging for swf parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6413 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 19:15:38 +00:00
orbiter
c864901087 - moved httpd.mime to defaults path
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 13:29:09 +00:00
low012
8829ec5f18 *) made sure that &nbsp; is replaced with a space and not just deleted in CharacterCoding.java
*) added annotations and made minor changes to serverObjects.java
*) set subversion properties for several files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-13 20:57:56 +00:00
orbiter
6c347a37eb more options for DocumentIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-13 08:43:02 +00:00
orbiter
e7f18ba24b refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:24:42 +00:00
orbiter
ce8dc575ca refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:12:19 +00:00
orbiter
bea3b99aff moved table and util classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:14:19 +00:00
orbiter
bd876eb4b7 moved io classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:00:49 +00:00
orbiter
c0e0e1f422 moved blob classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6395 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 00:43:25 +00:00
orbiter
1e4f8b56ed accumulated classes from different packages into the new rwi package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6394 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 00:39:15 +00:00
orbiter
194da25a2f moved kelondro index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:32:08 +00:00
orbiter
4446acc8cd moved kelondro order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:22:22 +00:00
orbiter
f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
- moved here the logging classes as part of the new net.yacy.kelondro package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:13:30 +00:00
orbiter
ea473e32b8 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6390 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 22:27:50 +00:00
orbiter
735e2737e3 * added index segments
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 14:44:20 +00:00
orbiter
09de5da74a once again a performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 18:26:54 +00:00
orbiter
2f6d88403e
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 18:10:56 +00:00
orbiter
d2615ea5a8 increased memory for scraper buffer to enhance parsing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 15:27:13 +00:00
orbiter
4bbbb74ec4 removed not necessary synchronization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 15:26:28 +00:00
hermens
67e5464cc2 Fix for SVN6380: x[] Arrays are unsuitable Keys for Maps without using a proper Comparator.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 12:55:36 +00:00
hermens
aeab8c7917 Prevent failed DHT attemps from overwriting newer peer info
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 00:17:29 +00:00
hermens
9324b5b6c5 Enhancements to DHT
- speed up deletion of containers when selscted from whole index
- correctly eliminate all references to unavailable URLs, not just the first encountered



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 00:03:16 +00:00
hermens
e49e2d75fe Limit the time Transmission.Chunks stay in the transmissionCloud by using a Map that iterates entires in insertion order.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-07 23:41:25 +00:00
orbiter
92db7c5d07 increased timeout for index retrieval
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-07 13:03:13 +00:00
lotus
386b9f35f6 activated resource observer for windows 7
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-07 06:20:24 +00:00
orbiter
6e0dc39a7d - some fixes to prevent blocking situations
- better logging for the crawler
- better default values for the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-06 21:52:55 +00:00
orbiter
51f2bbf04b possible fix for problem in http://forum.yacy-websuche.de/viewtopic.php?p=17655#p17655
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-06 09:56:14 +00:00
orbiter
f8371707e5 - possibly better termination for SplitTable
- better abstraction in DidYouMean

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-05 22:09:58 +00:00
orbiter
87780f2562 produce did-you-mean also for queries with more than one word
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-05 21:51:02 +00:00
orbiter
04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
- fixes for numerous other problems
- removed dead code
- resdesign of the strings-method, which produces now less memory overhead and may help to prevent OOMs
- another fix for the deadlock problem in SplitTable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-05 20:11:41 +00:00
orbiter
ea427df944 fixed a worst case situation of the condenser which may cause a temporary full CPU load because of a bad data structure usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-05 08:26:55 +00:00
orbiter
3e38035389 fix for interrupted thread during has() property check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-02 10:55:40 +00:00
orbiter
5bd1c1d205 just added some comments that had been produced to learn about OAI-PMH
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-01 22:56:22 +00:00
orbiter
6aa474f529 - better logging for web cache access and fail reasons
- better Exception handling for web cache access
- distinction between access of web cache for proxy and crawler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-01 13:08:19 +00:00
orbiter
3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 22:11:00 +00:00
orbiter
58a00205d5 re-activated the emergency close when too many server connections exist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 14:29:43 +00:00
orbiter
c57d2070e6 more logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 13:25:08 +00:00
orbiter
a995b95367 tried a fix for the httpd access bug (too many unclosed sessions)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 13:18:02 +00:00
orbiter
e1fba41cad better logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 21:52:17 +00:00
orbiter
2275f885a8 possible fix for concurrency problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 21:40:50 +00:00
low012
a6a3090c3d *) blacklist cleaner supports usage of regular expressions now
*) refacored BlacklistCleaner_p.java for better readability
*) moved check of validity of patterns to the Balcklist implementation since patterns might be valid in one implementation, but not in another
*) added method to check validity to Blacklist interface
*) fixed some minor issues like typos or wrong whitespaces
*) set subversion properties for a whole bunch of files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 21:28:49 +00:00
orbiter
5a93807781 improved web cache speed:
- removed one computation out of a synchronization
- removed one not necessary has() call


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:41:05 +00:00
orbiter
2e8b2867ff double performance of store method because it avoids one 'has'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:23:44 +00:00
orbiter
afda5b1adc new join method for indexes (not yet used)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:16:24 +00:00
orbiter
65b66c2c18 better handling of array files of length 0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:13:44 +00:00
orbiter
1957b5797a fix for seed generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:05:36 +00:00
orbiter
432154f725 new strategy for concurrent database index key retrieval
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-29 08:04:00 +00:00
orbiter
a11cd9f80f - removed reverse name lookup for http access logging (grr..)
- removed a synchronization in seed info string generation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-28 15:23:15 +00:00
orbiter
2e6bdce086 - added more logging to balancer
- changed balancer logic slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-27 22:35:22 +00:00
orbiter
1171a72006 fix for deadlock as seen in http://forum.yacy-websuche.de/viewtopic.php?p=17521#p17521
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-24 19:14:35 +00:00