Commit Graph

4485 Commits

Author SHA1 Message Date
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
7e2d6fac12 patch for bad values during local search join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:31:00 +00:00
orbiter
2ddb952a5c added the (fixed and anhanced) secondary search process. The process was disabled since some time.
The search process for more than one word should be enhanced now and produce much more results.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:11:12 +00:00
orbiter
58035ef784 fix in snippet loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 19:36:11 +00:00
orbiter
986d4f34d9 added a consistency check for new queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 18:59:42 +00:00
orbiter
73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 15:25:25 +00:00
orbiter
fbf021bb50 redesign of index abstract processing - currently disabled until enough peers have fix in SVN 6928
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 09:44:21 +00:00
orbiter
87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
- removed concurrency overhead for small number of index normalizations as it happens during remote search
- removed 'load only parseable' constraint for snippet fetch because some resources may not have any url file extension and these had therefore not been parseable and searcheable since they may become parseable after loading when their mime type is known
- this partly fixes some problems with http://forum.yacy-websuche.de/viewtopic.php?p=20300#p20300 but more changes are necessary to get all expected search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-17 11:59:40 +00:00
orbiter
7ddb70e7c6 new license for ai.greedy component: LGPL (nobody else than me modified that code)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 22:16:03 +00:00
orbiter
b62fb38344 fix for case where no release provider responds during auto-update (caused NPE)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 18:43:45 +00:00
orbiter
de4f30bb2e UTF-8 fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:22:31 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
989819a28c - reduced peer-ping time-out from 30 to 10 seconds
- no re-try for the peer ping any more (it's a test, let's see what happens)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 08:30:13 +00:00
orbiter
51332b787d reverted SVN 6869 as discussed with dulcedo in car after LinuxTag:
missing time-out may be cause of locks during DHT-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 20:30:53 +00:00
orbiter
b03caaa57a better handling of OOM situations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 19:44:05 +00:00
orbiter
56ff9d5fd4 - extended news size from 512 to 1024 characters
- a new news db will be created (news1024.db), the old one (news.db) can be deleted
- peers with too large news payload are not ignored any more (they may have been invisible because they had a too large news payload!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 10:43:47 +00:00
orbiter
353a924760 - changed default memory to 500m
- now xms is lower than xmx (lets try what happens)
- removed default path for intranet crawl starts to avoid confusion as seen on linuxtag
- added time-out to upnp request (i have a new router which may need that)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-14 21:36:40 +00:00
orbiter
c71d829bb5 more time-out properties for http connection manager
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 23:37:43 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
a83772c71b fixes and enhancements for balancer:
- crawl lists for each domain now uses a HandleSet which should use less memory than LinkedLists
- but: fill more entries into the domain lists (all available entries)
- fixes to selection criteria (best domain selection)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 09:30:23 +00:00
orbiter
9cde05418f fixed url crawl list display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-31 00:27:00 +00:00
orbiter
2eea806005 less errors in image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 11:18:05 +00:00
orbiter
30b337fa9f fixes to balancer when crawling filesystem (problem was: host == null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 11:17:38 +00:00
orbiter
844853243a fixed balancer time guessing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 10:28:42 +00:00
orbiter
3f93a0cc8f redesign of remote proxy settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-26 00:01:16 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
6950d8a33d fixes to SMB crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-23 01:17:44 +00:00
orbiter
bfdb9f4e06 extended statistics on Network servlet page
- added number of online peers at the last day and the last week
- changed design of statistic table
- network picture now shows exactly those peers that are counted in the statistic overview for one day

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-21 23:27:32 +00:00
orbiter
98c1d65415 - show up to 10 locations (maps) after search (instead of a max of 5)
- order locations by (primary) population and (secondary) longitude (reverse ordering, both)
- added population from GeoNames, OpenGeoDB does not have that information
- changed default viewpoint of map to (30,15); shows more land and europe in the center

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-21 08:18:04 +00:00
orbiter
9842fab6e4 - fixes to query parameter
- replaced/removed search query attribute (was old style, new is 'query' according to SRU)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-20 22:05:04 +00:00
orbiter
6ec9ced4cd - fix for multi-word search for locations
- changed description text to 'title' entity (subject is a list of keywords and was very messed)
- added ViewFile in location pop-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-20 15:07:57 +00:00
orbiter
1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
- used that to display two layers on map: cities and search result locations
- added many marker grafics for the display of the markers on the map
- some refactoring of the yacy news code plus bugfixes for latest move from Tree to Table data structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-19 12:53:09 +00:00
low012
ad823a4716 *) minor changes (only cosmetics, no functional changes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-18 21:31:59 +00:00
low012
dcac90d2f9 *) removed unnecessary import
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-18 21:09:41 +00:00
orbiter
bd0a9df895 fix for bad location double check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6884 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-18 11:54:30 +00:00
orbiter
e43e61e502 added another geolocalization data source: GeoNames
- added downloader option in DictionaryLoader
- added generalization (interfaces and overarching localization)
- more abstraction using the libraries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 23:49:30 +00:00
orbiter
118d589eff replaced the very very old data structure 'Records' with a simple table to fix the problem from
http://forum.yacy-websuche.de/viewtopic.php?p=20066#p20066

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 00:59:02 +00:00
orbiter
2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
- fix for initial generation of crawl profiles (one more reason to remove your crawl profiles)
- more String -> byte[] migration
- more logging for cache store/hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 23:50:07 +00:00
orbiter
2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
- migrated the opengeodb downloader to a new version of the opengeodb-dump


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 18:30:11 +00:00
orbiter
3661cb692c added dictionary loader servlet that can be used to get the geolocalization file:
/DictionaryLoader_p.html
Will also be used for more dictionary files in the future

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 09:52:53 +00:00
orbiter
90fa8fd4d4 - support gpx file extension
- non-blocking location search (time-out handling was wrong)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-12 08:49:20 +00:00
orbiter
439b44be9e removed exit from computation in ReferenceContainerArray.get merge method
an warning is still given, but method computes at normal operation
see also: http://forum.yacy-websuche.de/viewtopic.php?p=20038#p20038

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 23:36:40 +00:00
orbiter
7b880d73d0 adjustments to granted query size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 23:28:43 +00:00
orbiter
789c6b26ce added a location search service: using the following servlet/example:
http://localhost:8080/yacysearch_location.kml?query=berlin&maximumTime=2000&maximumRecords=100

This will open any application that can consume kml data (which will probably be google earth) on your computer and displays the search result as positions on a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 12:58:05 +00:00
orbiter
f23cbd2dab more bugfixes to date parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:32:46 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
6eba2cb96b fix in bmp parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-09 13:27:58 +00:00
orbiter
c45117f81f fixed dates in metadata
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-08 22:09:36 +00:00
orbiter
0a5fd15703 :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:06:31 +00:00
orbiter
ac16f582aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=20017#p20017
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:04:30 +00:00
orbiter
a7d038bb7a The oai ListFriends source list becomes configurable: just write them into defaults/oaiListFriendsSource.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 10:01:37 +00:00
orbiter
7ab207d93a better presentation of search result metadata and fixes to htcache loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-04 20:57:09 +00:00
orbiter
5fbf866cae - fixed resumption token generation for oai-pmh import
- relaxed dublin core parsing: the dc:reference tag may replace dc:identifier if this does not contain a valid url
- parsing of completeRecords number and presentation in the download list of oai import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-02 22:20:24 +00:00
orbiter
fc5efcc05a enhanced and fixed OAI-PMH import
- now importing OAI-PMH server list fron two sources
- simultanous import from several servers (even > 2000)
- check buttons on OAI-PMH server list to select multiple servers for import start
- it is possible to select all servers at once for import
- imported XML data is gzipped after import from surrogate reader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-30 14:03:51 +00:00
sixcooler
c2098f9399 close unused connections if there to many for DHT
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-29 23:38:50 +00:00
orbiter
455a763d7c performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-28 08:38:57 +00:00
orbiter
40a8d132d9 tried to fix 100% CPU when calling Balancer.top()
see also: http://forum.yacy-websuche.de/viewtopic.php?p=19978#p19978

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 22:37:50 +00:00
orbiter
b6cce08019 fixed a bug in rwi storage data size allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 22:22:16 +00:00
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
3aad50d38e :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-26 15:26:08 +00:00
orbiter
9edd38fbc5 connectionCount limit too low?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-26 15:24:47 +00:00
orbiter
7a05db0fcb fixed to prevent that too many open connections exist
- create less connections at maximum (smaller httpc connection pool size)
- create less connections per host (2, standard required by RFC)
- do not start DHT distributions if there are too many open connections
- clear open/idle connections earlier; run cleaner more often

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 23:08:36 +00:00
orbiter
a9b9bf667b fix for http://forum.yacy-websuche.de/viewtopic.php?p=19910#p19910
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:48:30 +00:00
orbiter
b18a7606a0 some performance hacks and fixed after reading dump in
http://forum.yacy-websuche.de/viewtopic.php?p=19920#p19920

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:37:36 +00:00
orbiter
2bc3cba6f1 - fix for 'do not write to cache' rule.
- do not read from cache if byte[] array is still filled from response object (will do less IO)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-24 08:22:45 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00
orbiter
bfa35d6d20 possible fix for ZURL.list counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 08:46:47 +00:00
orbiter
cff8ed134f added index check to prevent blocking in synchronization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-22 22:16:38 +00:00
orbiter
65f383e70b some adjustments to the httpc after testing with a very slow httpd
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-22 22:10:19 +00:00
orbiter
5ab5ac80fe fix for NPE in TextParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 22:35:47 +00:00
orbiter
b95ae2518b fix for assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 17:59:22 +00:00
orbiter
3247f0e901 fix for deadlocks caused by self-blocking access to TreeMap in concurrent environments. The TreeMap was replaced by a ConcurrentHashMap and additional care that the strings are compared all in lowercase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:46:02 +00:00
orbiter
027b971bde fix for concurrent quicksort: catch jobs from ThreadPoolExecutor that had been rejected because of full processing queues.
Non-catched jobs may have been the cause for blockings and freezes in case of overloading during strong processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:44:59 +00:00
orbiter
8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 11:29:27 +00:00
sixcooler
13f5b8e7ba fix for storing/getting bookmark-folders
called by Quix0r

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6825 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 17:55:59 +00:00
orbiter
7b69d79727 enhanced remove() operation: in many cases it is not necessary to return the removed object to the called.
for such cases the delete() operation was introduced which is sometimes much cheaper in operation since it does not need to create objects to hold the removed content and it does not need to read those objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 14:47:41 +00:00
orbiter
93ea0a4789 enhanced remove operation in search consequences (which are triggered when the snippet fetch proves that the word has disappeared from the page that was stored in the index)
- no direct deletion of referenced during search (shifted to time after search)
- bundling of all deletions for the references of a single word into one remove operation
- enhanced remove operation by caring that the collection is stored sorted (experimental)
- more String -> byte[] transition for search word lists
- clean up of unused code
- enhanced memory allocation of RowSet Objects (will use a little bit less memory which was wasted before)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 13:45:22 +00:00
orbiter
7a59012632 fix for NPE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 07:43:48 +00:00
orbiter
1a6c2f77b4 fix for NPE in statistic servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 00:08:43 +00:00
orbiter
64f29f990e a collection of performance hacks and code cleanup:
- removed usage of URL-Caches which could have been a memory leak
- removed unused classes and methods
- removed not necessary synchronizations
- added synchronization hacks where possible
- fine-tuned crawling speed to prevent IO of balancer
- fixed a bug in IODispatcher that may have caused that no merges were done
- reduced number of parameters in very often called methods (compare methods)
- reduced complexity of data structures of now massively used HandleSet class
- reduction of new String() and getBytes() usage / new methods to support this transition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-19 16:42:37 +00:00
orbiter
8b8107b2a3 reduced IO-load and synchronization/blocking
- enhanced the Balancer performance when building new domain stacks using a new Table buffer
- added the new Table buffer BufferedObjectIndex class
- changed order of access to LURL-read (prefereing segment over Crawl Queues) will reduced blocking time on balancer
- fixed PPM setting in Crawler_p servlet (had doubled values)
- reduced synchronization in IndexCell because it is not necessary: reduced blocking during indexing/merging/dumping
- removed did-you-mean cache in IndexCell because that caused too much overhead and more memory usage but was not very useful. This reduced also deadlocks that could be causes when searched are performed during indexing.




git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-18 21:55:20 +00:00
orbiter
ed07046870 flush only when > 3000 RWIs present + code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-16 16:07:19 +00:00
orbiter
3a50b5aa04 enhanced object hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 14:19:29 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
dde394a977 - shifted some computation out of synchronization to allow more concurrency
- removed synchronization where not necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 23:22:06 +00:00
orbiter
f204076d25 removed usage of temporary files: causes too much IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 22:17:18 +00:00
orbiter
48b9371735 changed balancer re-load counter. causes less blocking here doing intranet indexing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 20:57:26 +00:00
orbiter
650be3599f added a time-out to the RWI cache to flush the cache if it has not been written for ten minutes. This additional dump criteria is necessary because some data sources repeat their vocabulary and may cause that the number of words in a RWI does not increase while the number of references in the RWI set increases. Now the RWI Buffer is flushed every 10 minutes or later if at that time already a dump is ongoing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 20:30:34 +00:00
orbiter
ff6cf24b80 replaced RowSetArray in ObjectIndexCache with RowSet to reduce complexity in MergeIterator. This complexity caused too much computing overhead when the RowSetArray had become very large.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 19:26:51 +00:00
orbiter
0d04ab1422 new access tracking data type strategy; previous data types may have caused deadlocks of httpd when performing statistic cleanups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 16:18:04 +00:00
low012
fc43f3028e *) hopefully fixing NPE issue introduced in r6797
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:33:50 +00:00
orbiter
55d8e686ea performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:29:55 +00:00
orbiter
2f181d0027 introduced concurrency in HTCACHE storage compression
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 16:22:09 +00:00
orbiter
2e26744f4e more concurrency when normalizing RWI entries + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:47:57 +00:00
orbiter
555b333041 fix for wrong count of server processes. may fix non-access problems in some cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:34:16 +00:00
orbiter
aa083fc45c try to get a fix for OOM problem in case that there is no real problem with missing memory.
See also http://forum.yacy-websuche.de/viewtopic.php?p=19835#p19835

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:39:54 +00:00
orbiter
70e6222978 more concurrency during search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:12:36 +00:00
orbiter
4917f96729 fixes for some changes in SVN 6797 that caused NPEs when the bookmarks initialized
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 10:14:08 +00:00
low012
dff660441a *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:31:16 +00:00
low012
15d9ea8375 *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:25:15 +00:00
low012
2bc459252e *) changes for better code readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 01:16:09 +00:00
low012
dc93cec3a8 *) Java 1.5 compatibility (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=2764)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 00:25:46 +00:00
orbiter
67ec58d8e7 search performance enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-12 07:31:43 +00:00
hermens
4ec0092677 more null == proxy fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 18:31:12 +00:00
hermens
2f90f0ad56 Remove asserts blocking proxy use cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 15:12:39 +00:00
hermens
ef467a0303 Another workaround for the second part of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770
This should prevent URLs with bad referrer entries from being dropped by transferURL or even crashing the whole Transmission$Chunk


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 13:57:46 +00:00
sixcooler
eb2a4bb555 workaround(?) for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770&start=0&st=0&sk=t&sd=a&hilit=DefaultCharsetStringPart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 00:21:07 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
low012
b97ad0f380 *) some minor changes for better code readability
*) added more SVN properties

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-05 12:37:33 +00:00
orbiter
ba51d140e1 added more info in assert in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6782 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-28 22:59:19 +00:00
orbiter
a85c5bb8a7 added support for multiple (fail-over) network definition locations when http-locations are given. multiple locations can be given with a comma-separated list of urls pointing to the network definition file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 23:15:15 +00:00
orbiter
9b3840cb66 performance hacks for the template engine + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 22:52:48 +00:00
orbiter
5c10f8bc5f enhanced latest hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 07:19:49 +00:00
orbiter
b3238bec83 performance hack for httpd
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 07:09:55 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
72d8e9897b removed unnecessary cache flush call in backend of BufferedRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 12:44:13 +00:00
orbiter
749ffbd642 - added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception.
- reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:46:40 +00:00
orbiter
9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
312ca5d917 removed flush at end of every rwi entry since this reduces the write performance.
This should speed up RWI cache dump and RWI merge operations and should cause less blocking time during these processes for the indexer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:41:20 +00:00
orbiter
0018163c07 moved table row/column matching method from front-end to back-end
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:01:27 +00:00
orbiter
e12f1fd821 - added setting of access rights for executable scripts after auto-installation
The correct access right was missing expecially for bin/apicall.sh

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6769 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 09:51:01 +00:00
orbiter
31e29a8831 - removed synchronization during index dump and index cleaning
- added semaphores to synchronize index dump and index cleaning for each process separately

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 07:09:53 +00:00
orbiter
95f31da8da increase dump cache queue length from 1 to 2
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-24 20:36:35 +00:00
orbiter
6c093d6aed - enhanced domain navigator computation
- fixed domain navigator content in case that a mustmatch constraint was given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
e0da0a84b0 performance fix in http parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:52 +00:00
orbiter
90dd197ae7 - no latency for local crawls
- catch interrupted exception during 'fast' crawls in workflow processor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:18 +00:00
orbiter
bfb518cd47 some refactoring to get the LoaderDispatcher a little bit more independent from the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:28:03 +00:00
orbiter
36bd843ece for for RFC5322 comformance as suggested by Quix0r in http://forum.yacy-websuche.de/viewtopic.php?p=19585#p19585
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6754 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:23:47 +00:00
orbiter
c855fc48c6 only load robots.txt for http and http protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:15:11 +00:00
orbiter
748abfcffa added patches to prevent yacy-protocol DoS settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 15:31:15 +00:00
orbiter
e820ed061a avoiding excessive DNS lookups to determine localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
11983bc936 redesigned some parts of the parser entry point:
- in all cases that the parser is entered it is a whole set of possible parsers computed according to given mime type and file extension,
that means that all parsers are considered where the registered mime acceptance and extension acceptions matches.
that may cause that several parsers are tried for the same file which will cause a success in cases where there was only the mime type was used to choose the right parser and the mime type was given wrongly by the host httpd.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 13:04:42 +00:00
orbiter
de88200e11 - added Byte Order Mark recognition to serverObjects
The BOM character FEFF may appear at the beginning of strings if some browsers append the characters %EF%BB%BF to input values.
see http://en.wikipedia.org/wiki/Byte_order_mark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6748 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 10:58:40 +00:00
orbiter
89b4fff1c2 adopted ant script for new exif library
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:36:38 +00:00
orbiter
24e5faee75 added exif parsing for jpg images
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:23:38 +00:00
orbiter
82f76e1296 removed log line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 20:31:38 +00:00
orbiter
0f8004f9da enhanced html parser to recognize a href tags inside header tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 17:52:07 +00:00
orbiter
3300930fc5 - (almost) fixed FTP crawler
- integrated/fixed SMB crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
1198b9989d bugfixes, more sorttable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 15:39:36 +00:00
orbiter
9623d9e6d2 added a smb loader component for the YaCy crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
ae2f3f000f better handling of table copy abandon .. prevent memory leak
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 13:32:15 +00:00
orbiter
0769517129 added a robots.txt monitor in the crawler monitor submenu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 11:31:15 +00:00
orbiter
48995e71c4 added soft-auth to general authentication scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 00:07:17 +00:00
orbiter
72f00dee59 removed never-used server access account function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
57e1eae95e longer time-out for url fetching .. may help to show all that links that the statistic say for a search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6727 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 22:23:08 +00:00
orbiter
9e639603e3 after frequent occurrences of 100% CPU usages and permanent blockings I try to disable a function in a method that may cause the problem when calling an external library (apache http client 3.x). The thread dump that shows the problem is attached here.
at java.lang.StringCoding.encode(StringCoding.java:266)
	at java.lang.String.getBytes(String.java:946)
	at org.apache.commons.httpclient.util.EncodingUtil.getAsciiBytes(EncodingUtil.java:237)
	at org.apache.commons.httpclient.methods.multipart.Part.sendDispositionHeader(Part.java:220)
	at org.apache.commons.httpclient.methods.multipart.Part.send(Part.java:308)
	at org.apache.commons.httpclient.methods.multipart.Part.sendParts(Part.java:385)
	at org.apache.commons.httpclient.methods.multipart.MultipartRequestEntity.writeRequest(MultipartRequestEntity.java:164)
	at de.anomic.http.client.Client.zipRequest(Client.java:364)
	at de.anomic.http.client.Client.POST(Client.java:339)
	at de.anomic.yacy.yacyClient.wput(yacyClient.java:285)
	at de.anomic.yacy.yacyClient.transferURL(yacyClient.java:1053)
	at de.anomic.yacy.yacyClient.transferIndex(yacyClient.java:942)
	at de.anomic.yacy.dht.Transmission$Chunk.transmit(Transmission.java:200)
	at de.anomic.yacy.dht.Dispatcher.storeDocumentIndex(Dispatcher.java:397)
	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:103)
	at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:66)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:637)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:19:23 +00:00
orbiter
4144927d94 show less errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:02:08 +00:00
orbiter
b88f5fbb4b slightly changed crawling policy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:46:08 +00:00
orbiter
de01fe0e6d fix for bug in url parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:33:18 +00:00