Commit Graph

391 Commits

Author SHA1 Message Date
sixcooler
d88b9606d1 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2923
+ some client fine tune

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 16:58:33 +00:00
orbiter
6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
- confirm that database objects that are not supposed to grow do not have a index memory management that is designed for growth
- changed index sorting method in such a way that it allocates less objects during quicksort
- database classes classes renaming (shorter, naming addresses that objects hold in RAM)
- added a large number of asserts to check if objects actually take the RAM that they should have


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 13:33:12 +00:00
orbiter
5924a0d851 - enhanced concurrency in database index access for multicore
- added statistics about database index caches in PerformanceMemory_p.html
- adoped many classes to use the new statistics
- added missing close statements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 04:58:48 +00:00
orbiter
55a2536bcf enhancement in drawing speed and reduction of object allocation during drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:44:08 +00:00
orbiter
9ab06bc333 enhancement in sorting efficiency (database root operation): less object allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:42:28 +00:00
sixcooler
39d96abbb5 fix yacyRelease download
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2920&p=20545#p20545)
better cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 20:13:20 +00:00
sixcooler
349e4dee9d ... migrating to HttpComponents-Client-4.x ...
added cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7012 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 14:16:44 +00:00
sixcooler
c29f24a519 ... migrating to HttpComponents-Client-4.x ...
- Proxy
- Release-download

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-01 22:35:11 +00:00
orbiter
d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 11:40:58 +00:00
orbiter
989948e1a9 fixed generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 07:13:15 +00:00
orbiter
e1015ead2c static access to constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:52:58 +00:00
orbiter
27d8a8b53e removed wrong com.sun.codec class access in generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7003 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:49:09 +00:00
orbiter
bbf887d879 added generics to UPnP classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7002 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:48:01 +00:00
sixcooler
15e8c13526 ... migrating to HttpComponents-Client-4.x ...
(gzip decompression, httploader, robots, ...)

+ enable proxy-crawling while log is fine

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 01:16:26 +00:00
mikeworks
b12db14b9f Added Generics to new net.yacy.upnp.* classes to eliminate compiler warnings
Added @Deprecated for deprecated functions getIPDevices and getPPPDevices in class InternetGatewayDevice
Changed debug statement in Domains.java and corrected filename in comments header

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-24 13:48:45 +00:00
sixcooler
b7102eff92 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 23:08:37 +00:00
mikeworks
572e429eff - fixes UPnP not working discussion on forum: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2881
SVN 6987 fixed net.yacy.upnp.devices.UPNPRootDevice for usage with JxPath > 1.3 by using a default namespace (xmlns="urn:schemas-upnp-org:device-1-0")
This commit now fixes the same problem for net.yacy.upnp.devices.UPNPService with default namespace (xmlns="urn:schemas-upnp-org:service-1-0")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 19:13:37 +00:00
mikeworks
2a20282505 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6987 6c8d7289-2bf4-0310-a012-ef5d649a1542 2010-07-22 19:07:02 +00:00
lotus
965aa97993 including sbbi upnplib as source again
http://www.sbbi.net/site/upnp/index.html

renamed package to yacy
all options are also named "yacy" instead of "sbbi"

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 18:02:16 +00:00
orbiter
60caade056 removed debug output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6984 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 07:59:22 +00:00
sixcooler
52718e6dcb ... migrating to HttpComponents-Client-4.x ...
monitoring: replaced unused 'idletime' by uploading bytes
added some kind of 'upload-throttling' at dht-out :-)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 00:51:41 +00:00
sixcooler
5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
monitoring and first try to use remoteProxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-20 01:14:28 +00:00
orbiter
dec1419bc3 ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:18:32 +00:00
orbiter
22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:14:20 +00:00
orbiter
8674a65488 removed override directive which caused a compile error in eclipse helios
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 18:37:20 +00:00
low012
dc5f0e357c *) fixed SVN properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:02:03 +00:00
low012
01d6b952f0 *) minor changes for easier to read code, no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:00:43 +00:00
sixcooler
0e56d29335 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-15 00:59:53 +00:00
sixcooler
2ad5829b26 correct Timeoutparamter at HttpComponents-Client-4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-14 02:47:40 +00:00
sixcooler
e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-13 22:10:24 +00:00
sixcooler
c5c67f0504 start migrating to HttpComponents-Client-4.x
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-12 23:07:05 +00:00
orbiter
25024d6ab2 fix for problen when accessing the metadata index. The index was not available for all peers with no RAM table copy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 07:22:50 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
low012
d4851441b0 *) Added Android packages to parser in order to be able to create a decentralized search for direct downloads of Android apps.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 20:41:08 +00:00
orbiter
150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
see http://www.gnu.org/licenses/license-list.html for explanation
Since (as far as I know) nobody else has ever contributed to these files I may be allowed to just apply an older license.
You may consider this as a dual-licensing and may use and optionally replicate the older files under GPL 3.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 16:25:14 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00
orbiter
bf25407fdd added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 23:10:57 +00:00
orbiter
1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
- added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 22:39:27 +00:00
orbiter
5a4684f21f allow words with length >= 2 (you can't search for 'wm' with 3-letter words...)
lets try that. If we run into a memory problem because of too many 2-letter-words, then we must introduce whitelists for 2-letter words.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 16:31:26 +00:00
orbiter
37b8827a7a - removed the UPnP library sources from sbbi and added the jar library again. The library was included to get support for fedora releases, but after this time the fact that the sbbi cannot be part of fedora should be re-discussed. If this will still not be possible, then we may integrate the sbbi UPnP package using reflection.
- cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 10:32:47 +00:00
orbiter
103c848af8 enhancements in image drawing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-24 13:20:45 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
7e2d6fac12 patch for bad values during local search join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:31:00 +00:00
orbiter
986d4f34d9 added a consistency check for new queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 18:59:42 +00:00
orbiter
73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 15:25:25 +00:00
orbiter
fbf021bb50 redesign of index abstract processing - currently disabled until enough peers have fix in SVN 6928
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 09:44:21 +00:00
orbiter
87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
- removed concurrency overhead for small number of index normalizations as it happens during remote search
- removed 'load only parseable' constraint for snippet fetch because some resources may not have any url file extension and these had therefore not been parseable and searcheable since they may become parseable after loading when their mime type is known
- this partly fixes some problems with http://forum.yacy-websuche.de/viewtopic.php?p=20300#p20300 but more changes are necessary to get all expected search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-17 11:59:40 +00:00
orbiter
7ddb70e7c6 new license for ai.greedy component: LGPL (nobody else than me modified that code)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 22:16:03 +00:00
orbiter
de4f30bb2e UTF-8 fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:22:31 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
51332b787d reverted SVN 6869 as discussed with dulcedo in car after LinuxTag:
missing time-out may be cause of locks during DHT-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 20:30:53 +00:00
orbiter
b03caaa57a better handling of OOM situations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 19:44:05 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
a83772c71b fixes and enhancements for balancer:
- crawl lists for each domain now uses a HandleSet which should use less memory than LinkedLists
- but: fill more entries into the domain lists (all available entries)
- fixes to selection criteria (best domain selection)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 09:30:23 +00:00
orbiter
9cde05418f fixed url crawl list display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-31 00:27:00 +00:00
orbiter
2eea806005 less errors in image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 11:18:05 +00:00
orbiter
3f93a0cc8f redesign of remote proxy settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-26 00:01:16 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
6950d8a33d fixes to SMB crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-23 01:17:44 +00:00
orbiter
98c1d65415 - show up to 10 locations (maps) after search (instead of a max of 5)
- order locations by (primary) population and (secondary) longitude (reverse ordering, both)
- added population from GeoNames, OpenGeoDB does not have that information
- changed default viewpoint of map to (30,15); shows more land and europe in the center

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-21 08:18:04 +00:00
orbiter
9842fab6e4 - fixes to query parameter
- replaced/removed search query attribute (was old style, new is 'query' according to SRU)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-20 22:05:04 +00:00
orbiter
1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
- used that to display two layers on map: cities and search result locations
- added many marker grafics for the display of the markers on the map
- some refactoring of the yacy news code plus bugfixes for latest move from Tree to Table data structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-19 12:53:09 +00:00
orbiter
bd0a9df895 fix for bad location double check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6884 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-18 11:54:30 +00:00
orbiter
e43e61e502 added another geolocalization data source: GeoNames
- added downloader option in DictionaryLoader
- added generalization (interfaces and overarching localization)
- more abstraction using the libraries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 23:49:30 +00:00
orbiter
118d589eff replaced the very very old data structure 'Records' with a simple table to fix the problem from
http://forum.yacy-websuche.de/viewtopic.php?p=20066#p20066

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 00:59:02 +00:00
orbiter
2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
- fix for initial generation of crawl profiles (one more reason to remove your crawl profiles)
- more String -> byte[] migration
- more logging for cache store/hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 23:50:07 +00:00
orbiter
2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
- migrated the opengeodb downloader to a new version of the opengeodb-dump


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 18:30:11 +00:00
orbiter
439b44be9e removed exit from computation in ReferenceContainerArray.get merge method
an warning is still given, but method computes at normal operation
see also: http://forum.yacy-websuche.de/viewtopic.php?p=20038#p20038

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 23:36:40 +00:00
orbiter
789c6b26ce added a location search service: using the following servlet/example:
http://localhost:8080/yacysearch_location.kml?query=berlin&maximumTime=2000&maximumRecords=100

This will open any application that can consume kml data (which will probably be google earth) on your computer and displays the search result as positions on a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 12:58:05 +00:00
orbiter
f23cbd2dab more bugfixes to date parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:32:46 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
6eba2cb96b fix in bmp parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-09 13:27:58 +00:00
orbiter
c45117f81f fixed dates in metadata
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-08 22:09:36 +00:00
orbiter
0a5fd15703 :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:06:31 +00:00
orbiter
ac16f582aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=20017#p20017
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:04:30 +00:00
orbiter
a7d038bb7a The oai ListFriends source list becomes configurable: just write them into defaults/oaiListFriendsSource.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 10:01:37 +00:00
orbiter
5fbf866cae - fixed resumption token generation for oai-pmh import
- relaxed dublin core parsing: the dc:reference tag may replace dc:identifier if this does not contain a valid url
- parsing of completeRecords number and presentation in the download list of oai import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-02 22:20:24 +00:00
orbiter
fc5efcc05a enhanced and fixed OAI-PMH import
- now importing OAI-PMH server list fron two sources
- simultanous import from several servers (even > 2000)
- check buttons on OAI-PMH server list to select multiple servers for import start
- it is possible to select all servers at once for import
- imported XML data is gzipped after import from surrogate reader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-30 14:03:51 +00:00
orbiter
455a763d7c performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-28 08:38:57 +00:00
orbiter
b6cce08019 fixed a bug in rwi storage data size allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 22:22:16 +00:00
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
b18a7606a0 some performance hacks and fixed after reading dump in
http://forum.yacy-websuche.de/viewtopic.php?p=19920#p19920

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:37:36 +00:00
orbiter
2bc3cba6f1 - fix for 'do not write to cache' rule.
- do not read from cache if byte[] array is still filled from response object (will do less IO)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-24 08:22:45 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00
orbiter
cff8ed134f added index check to prevent blocking in synchronization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-22 22:16:38 +00:00
orbiter
5ab5ac80fe fix for NPE in TextParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 22:35:47 +00:00
orbiter
b95ae2518b fix for assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 17:59:22 +00:00
orbiter
3247f0e901 fix for deadlocks caused by self-blocking access to TreeMap in concurrent environments. The TreeMap was replaced by a ConcurrentHashMap and additional care that the strings are compared all in lowercase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:46:02 +00:00
orbiter
027b971bde fix for concurrent quicksort: catch jobs from ThreadPoolExecutor that had been rejected because of full processing queues.
Non-catched jobs may have been the cause for blockings and freezes in case of overloading during strong processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:44:59 +00:00
orbiter
8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 11:29:27 +00:00
orbiter
7b69d79727 enhanced remove() operation: in many cases it is not necessary to return the removed object to the called.
for such cases the delete() operation was introduced which is sometimes much cheaper in operation since it does not need to create objects to hold the removed content and it does not need to read those objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 14:47:41 +00:00
orbiter
93ea0a4789 enhanced remove operation in search consequences (which are triggered when the snippet fetch proves that the word has disappeared from the page that was stored in the index)
- no direct deletion of referenced during search (shifted to time after search)
- bundling of all deletions for the references of a single word into one remove operation
- enhanced remove operation by caring that the collection is stored sorted (experimental)
- more String -> byte[] transition for search word lists
- clean up of unused code
- enhanced memory allocation of RowSet Objects (will use a little bit less memory which was wasted before)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 13:45:22 +00:00
orbiter
7a59012632 fix for NPE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 07:43:48 +00:00
orbiter
1a6c2f77b4 fix for NPE in statistic servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 00:08:43 +00:00
orbiter
64f29f990e a collection of performance hacks and code cleanup:
- removed usage of URL-Caches which could have been a memory leak
- removed unused classes and methods
- removed not necessary synchronizations
- added synchronization hacks where possible
- fine-tuned crawling speed to prevent IO of balancer
- fixed a bug in IODispatcher that may have caused that no merges were done
- reduced number of parameters in very often called methods (compare methods)
- reduced complexity of data structures of now massively used HandleSet class
- reduction of new String() and getBytes() usage / new methods to support this transition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-19 16:42:37 +00:00
orbiter
8b8107b2a3 reduced IO-load and synchronization/blocking
- enhanced the Balancer performance when building new domain stacks using a new Table buffer
- added the new Table buffer BufferedObjectIndex class
- changed order of access to LURL-read (prefereing segment over Crawl Queues) will reduced blocking time on balancer
- fixed PPM setting in Crawler_p servlet (had doubled values)
- reduced synchronization in IndexCell because it is not necessary: reduced blocking during indexing/merging/dumping
- removed did-you-mean cache in IndexCell because that caused too much overhead and more memory usage but was not very useful. This reduced also deadlocks that could be causes when searched are performed during indexing.




git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-18 21:55:20 +00:00
orbiter
ed07046870 flush only when > 3000 RWIs present + code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-16 16:07:19 +00:00
orbiter
3a50b5aa04 enhanced object hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 14:19:29 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
dde394a977 - shifted some computation out of synchronization to allow more concurrency
- removed synchronization where not necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 23:22:06 +00:00
orbiter
f204076d25 removed usage of temporary files: causes too much IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 22:17:18 +00:00
orbiter
650be3599f added a time-out to the RWI cache to flush the cache if it has not been written for ten minutes. This additional dump criteria is necessary because some data sources repeat their vocabulary and may cause that the number of words in a RWI does not increase while the number of references in the RWI set increases. Now the RWI Buffer is flushed every 10 minutes or later if at that time already a dump is ongoing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 20:30:34 +00:00
orbiter
ff6cf24b80 replaced RowSetArray in ObjectIndexCache with RowSet to reduce complexity in MergeIterator. This complexity caused too much computing overhead when the RowSetArray had become very large.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 19:26:51 +00:00
orbiter
55d8e686ea performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:29:55 +00:00
orbiter
2f181d0027 introduced concurrency in HTCACHE storage compression
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 16:22:09 +00:00
orbiter
2e26744f4e more concurrency when normalizing RWI entries + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:47:57 +00:00
orbiter
aa083fc45c try to get a fix for OOM problem in case that there is no real problem with missing memory.
See also http://forum.yacy-websuche.de/viewtopic.php?p=19835#p19835

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:39:54 +00:00
orbiter
70e6222978 more concurrency during search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:12:36 +00:00
low012
dc93cec3a8 *) Java 1.5 compatibility (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=2764)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 00:25:46 +00:00
orbiter
67ec58d8e7 search performance enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-12 07:31:43 +00:00
hermens
ef467a0303 Another workaround for the second part of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770
This should prevent URLs with bad referrer entries from being dropped by transferURL or even crashing the whole Transmission$Chunk


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 13:57:46 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
72d8e9897b removed unnecessary cache flush call in backend of BufferedRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 12:44:13 +00:00
orbiter
749ffbd642 - added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception.
- reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:46:40 +00:00
orbiter
9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
312ca5d917 removed flush at end of every rwi entry since this reduces the write performance.
This should speed up RWI cache dump and RWI merge operations and should cause less blocking time during these processes for the indexer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:41:20 +00:00
orbiter
0018163c07 moved table row/column matching method from front-end to back-end
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:01:27 +00:00
orbiter
31e29a8831 - removed synchronization during index dump and index cleaning
- added semaphores to synchronize index dump and index cleaning for each process separately

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 07:09:53 +00:00
orbiter
6c093d6aed - enhanced domain navigator computation
- fixed domain navigator content in case that a mustmatch constraint was given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
e0da0a84b0 performance fix in http parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:52 +00:00
orbiter
90dd197ae7 - no latency for local crawls
- catch interrupted exception during 'fast' crawls in workflow processor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:18 +00:00
orbiter
bfb518cd47 some refactoring to get the LoaderDispatcher a little bit more independent from the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:28:03 +00:00
orbiter
36bd843ece for for RFC5322 comformance as suggested by Quix0r in http://forum.yacy-websuche.de/viewtopic.php?p=19585#p19585
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6754 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:23:47 +00:00
orbiter
748abfcffa added patches to prevent yacy-protocol DoS settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 15:31:15 +00:00
orbiter
e820ed061a avoiding excessive DNS lookups to determine localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
11983bc936 redesigned some parts of the parser entry point:
- in all cases that the parser is entered it is a whole set of possible parsers computed according to given mime type and file extension,
that means that all parsers are considered where the registered mime acceptance and extension acceptions matches.
that may cause that several parsers are tried for the same file which will cause a success in cases where there was only the mime type was used to choose the right parser and the mime type was given wrongly by the host httpd.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 13:04:42 +00:00
orbiter
89b4fff1c2 adopted ant script for new exif library
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:36:38 +00:00
orbiter
24e5faee75 added exif parsing for jpg images
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:23:38 +00:00
orbiter
82f76e1296 removed log line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 20:31:38 +00:00
orbiter
0f8004f9da enhanced html parser to recognize a href tags inside header tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 17:52:07 +00:00
orbiter
3300930fc5 - (almost) fixed FTP crawler
- integrated/fixed SMB crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
1198b9989d bugfixes, more sorttable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 15:39:36 +00:00
orbiter
9623d9e6d2 added a smb loader component for the YaCy crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
ae2f3f000f better handling of table copy abandon .. prevent memory leak
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 13:32:15 +00:00
orbiter
0769517129 added a robots.txt monitor in the crawler monitor submenu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 11:31:15 +00:00
orbiter
72f00dee59 removed never-used server access account function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
de01fe0e6d fix for bug in url parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:33:18 +00:00
orbiter
c4bdb1e7f2 added one more option in ViewFile to show an iframe like for the orginal web page content but using the cache than the direct link to the content in the web. Upgraded the very old and previously not any more used CacheResource_p servlet to a new and working version.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 23:41:51 +00:00
orbiter
1bbe14d23f SVN 6716 unfortunately contained parts of the unfinished SMB integration. To fix compile errors the remaining parts of the SMB implementation stub is added with this commit.
This adds the jcifs smb library.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:46:22 +00:00
orbiter
884b262130 - added a new Wiki Namespace Navigator
- some redesign of Navigator data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:25:49 +00:00
orbiter
270fb38674 - fixed some bugs in Table viewer
- added 'select all' feature in Tables_p
- enhanced ViewFile.html: has now an input field to load arbitrary resources from the web and analyze them (!!!)
- included the ViewFile servlet into the Index Administration menu
- show in ViewFile if ressource is in url-db and/or in Web cache
- bugfixes to BEncodedHeap and Tables management

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 15:41:15 +00:00
orbiter
727dd9b193 - fixed a bug in robots.txt parser
- moved storage of robots.txt entries to WorkTables, so it is now possible to browse the robots entries with the table browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 11:58:07 +00:00
orbiter
54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
- added evaluation and indexing prevention in case that a noindexing is given in a html file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-03 23:32:56 +00:00
sixcooler
cd6de83905 next try for for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2703
(reverted 6692)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-23 15:59:58 +00:00
sixcooler
bfe4693e9a fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2703
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-23 13:46:56 +00:00
orbiter
564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-16 23:06:04 +00:00
orbiter
30c8185139 fix for sid check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:31:32 +00:00