Commit Graph

4726 Commits

Author SHA1 Message Date
orbiter
9ebc75db4b fix for channel authorization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 23:14:02 +00:00
orbiter
6d9e5865ee faster appearance of search result page (but complete search time is the same)
this was inspired by http://bugs.yacy.net/view.php?id=37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:17:02 +00:00
orbiter
f7ca84cfc0 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:15:13 +00:00
orbiter
84c9658644 added a file type navigator
added a protocol navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 15:39:52 +00:00
orbiter
31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 11:57:17 +00:00
orbiter
4b425ffdd2 fix for http://bugs.yacy.net/view.php?id=41
added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-22 10:19:32 +00:00
orbiter
7db208c992 performance hacks: more pre-allocated StringBuilder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-21 23:10:50 +00:00
orbiter
87bd559c42 fixed warning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:53:43 +00:00
orbiter
f30d36b101 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:02:06 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
sixcooler
7bfa6bb4b6 prevent getting a yacySeed from zero-length-hash-string by chance
(for eg.: proxy-crawls got displayed as initiated by some other peer)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 22:58:17 +00:00
orbiter
bce280a308 update on options for interface graphics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 22:48:21 +00:00
orbiter
2683162ec5 - added more options to access grid picture, web structure picture and network graphics
- remove test class


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-02 23:27:26 +00:00
orbiter
0c1b29f3c9 - applied many small performance hacks
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-01 19:31:56 +00:00
f1ori
900dacbf97 * improve link rewriting in proxy-url
* only rewrites links, which are in current search domain

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-01 13:27:04 +00:00
f1ori
dc855d881b * further improve proxyurl
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 21:25:20 +00:00
orbiter
a7a6b392f5 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 10:16:43 +00:00
orbiter
fe0c08455b more concurrency (enhancement) hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 08:53:58 +00:00
orbiter
0e9a99cb05 another resource hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 07:51:18 +00:00
orbiter
535b6b953c more hacks to omit superfluous string object allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 07:31:17 +00:00
orbiter
87082f407e less String object creation during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 04:19:20 +00:00
orbiter
ab5a16b957 lesse memory occupation during ranking and faster host navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-29 20:33:12 +00:00
orbiter
1489ebeedf one more hack to free ram for search events
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 14:26:37 +00:00
f1ori
ddcc333acc * fix negative result counts
results sorted out by add to RankingProcess were counted in
sortedout-counter, but were not added to remote_indexCount nor
local_indexCount

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 11:21:00 +00:00
orbiter
fa734bdf9f better memory protection in search logger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7748 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 11:18:22 +00:00
orbiter
dbea40d536 - changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
- forced a possible short memory status when a search is started to flush caches that may cause search-heaps with resource contention effects

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 09:32:03 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
746e3c3b06 Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 16:34:35 +00:00
f1ori
14e1666b21 * fix replacing regexes in url proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 16:09:29 +00:00
orbiter
e28bd0d038 fix for some possible causes of memory leaks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 14:35:32 +00:00
orbiter
09ba6814c0 - non-blocking word hash computation with dynamic digest object generation (this was important!)
- (very) small performance enhancement in did-you-mean


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7740 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 12:58:11 +00:00
orbiter
10e2f588f8 - enhanced ybr ranking computation
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 10:57:02 +00:00
orbiter
bd55dcee50 - commented out experimental distributed ranking loading
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 21:08:01 +00:00
orbiter
d1dbbd956a always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 09:31:07 +00:00
orbiter
0d040ff6bb fix for bug 0000036: no crawling of https pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 09:14:32 +00:00
orbiter
3ed4a09368 small features, some bug fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-23 21:08:04 +00:00
orbiter
e55c254f7b enhanced logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-22 20:12:13 +00:00
orbiter
b45701d20f this is a re-implementation of the YaCy Block Rank feature
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing

Please play around with the ranking settings and see if this helped to make search results better.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-18 14:26:28 +00:00
orbiter
021840e5ba removed (almost) deadlocks and unnecessary CPU load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-17 00:00:01 +00:00
orbiter
123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.

The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.

To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-15 22:57:31 +00:00
orbiter
1d8b0f74f4 one more fix for SVN 7713
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 15:31:24 +00:00
orbiter
0960261769 fix for svn 7713
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 15:20:57 +00:00
orbiter
5b579e21a3 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:21:40 +00:00
orbiter
039126cfaf better handling of on/off switched solr indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-08 22:47:20 +00:00
orbiter
9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder'
see http://bugs.yacy.net/view.php?id=9

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 07:37:46 +00:00
orbiter
0621a15f89 fix for wrong search result counter: added a counter for all filtered out entities
see also http://bugs.yacy.net/view.php?id=5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-06 23:04:27 +00:00
orbiter
9c33b2fb58 fix for String Matcher in case that no snippet is returned (NPE)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 23:11:03 +00:00
orbiter
76f2817e00 a fix for the snippet computation and hopefully better snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 23:05:38 +00:00
orbiter
deda54d684 - relaxed matching of string-search (this is now case-insensitive)
- added transport of string-search pattern to remote search protocol
- fixed a problem parsing snippets with a '-' inside

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 22:37:06 +00:00
orbiter
6e42d4de88 - added full-String search function: find things that match exactly what is quoted in the query
- re-structuring authentification methods to fix a problem with API steering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 00:25:14 +00:00
apfelmaennchen
8b8db2aaba YMarks: some small changes/fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-03 21:21:06 +00:00
apfelmaennchen
441035f1f4 YMarks: some improvements to flexigrid quick search on YMarks.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 20:11:58 +00:00
orbiter
6fa439c82b - refactoring of robots
- added option to crawler to send error-URLs to solr
- changed solr scheme slightly (no multi-value fields where no multi values are)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 14:05:51 +00:00
apfelmaennchen
e7c2ea193b YMark:
- general improvements on importers, especially on auto tagging
- added get_tags (needed for tag clouds etc.)
- improved flexigrid support
- added YMarks.html (not fully working) that will eventually replace Bookmarks.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-01 21:42:48 +00:00
orbiter
3b578a28ef some patches to prevent that empty or bad IP information is broadcasted
- on client-side: fix bad IP reports from remote Peers by replacing their reported IP with their server IP if the reported IP is bad, broken or disallowed
- on server-side: the same during a peer ping (here the ping'ed server acts also as client during the back-ping) and also when receiving a message or a search where the client sends also its seed. Here the IP is replaced by the client IP if the reported IP is broken or bad

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 10:58:12 +00:00
orbiter
361841df16 another patch according to http://bugs.yacy.net/view.php?id=26#c36
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 02:26:50 +00:00
orbiter
37fede9d30 better logic for proper seed ip recognition and better error messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 02:19:13 +00:00
orbiter
8b95a26866 better magic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 02:00:37 +00:00
orbiter
2700a58e5a added a magic to the peer ping that will be used in case that the contacting peer requests that it's reported IP shall be used for a back-ping. The back-ping now also returns the same magic which will make it possible that the requested peer can verify that the back-pinged peer is actually the same peer.
This is also a protection against the foced-fake of a external IP: if such an IP was faked, then the next ping from the affected peer to another peer looks like a staticIP report. Such a bad staticIP-by-faked-response can now be discovered and fixed by the peer that gets the second ping after the first ping contained a faked response.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 01:52:20 +00:00
orbiter
8879cc1db2 removed System.out.println
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 14:08:02 +00:00
orbiter
f6077b3cc0 added more attributes for html parser and enhanced data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 13:09:01 +00:00
f1ori
0b02083e97 * function for simple crawl of one url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 13:04:33 +00:00
f1ori
d671de8c17 add ranking weight to json-search-results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 11:18:14 +00:00
orbiter
d8e934c085 better abstraction of http client identification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-26 13:35:29 +00:00
sixcooler
a3e707283d not using HTTPConnector anymore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-26 11:46:31 +00:00
orbiter
b77b8cac0c - enhanced html parser: recognized much more details in the content
- added more properties to solr index
- refactoring
- more constants in switchboard
- fix for some NPEs
- recognition of more images
- removed synchronization in HandleMap (obviously not necessary?)
- added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-21 13:58:49 +00:00
low012
bc84d2bc9d *) fixed typo in stop script
*) added <u> </u> tags for underlined text in Wiki Code
*) minor code changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-20 22:54:29 +00:00
apfelmaennchen
b2281f0b7d YMark: intermediate work towards flexigrid support
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-20 22:33:01 +00:00
low012
06d50fd801 *) fixed stupid bug (introduced in r7663 by myself) which caused wrong parsing of Wiki pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-20 17:27:59 +00:00
apfelmaennchen
60412d2bb3 YMark:
- more refactoring >> YMarkEntry
- integration of SurrogateReader as bookmark importer
- various small bug fixes e.g. get_xbel.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-18 21:42:14 +00:00
orbiter
3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
- added deletion of solr index in IndexControlRWIs
- added asynchronous adding of large url lists (happens when crawls are startet with file)
- fixed npe in Image display
- replaced language warning with fine logging
- added a domain name cache in Domains that helps to speed up the isLocal property (less DNS lookups)
- added a new storage class for this new cache: KeyList. The domain key list is stored in DATA/WORK/globalhosts.list
- added concurrent solr updates and chunked transfers (50 documents until a commit is done) for high-speed feeding (> 40000 ppm)
- fixed a bug in content scraper that chopped off large parts of crawl lists (using crawl start from file)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-18 16:11:16 +00:00
orbiter
fd3baa9025 fix for http://bugs.yacy.net/view.php?id=24
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-17 22:37:04 +00:00
low012
2e9694c9e9 *) removed recursion which hopefully prevents exception
*) fixed bug in creation of table of content which caused double entries if a page was previewed more than once

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-17 21:02:18 +00:00
apfelmaennchen
a2e86daae9 YMark: more bug fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 22:09:50 +00:00
apfelmaennchen
62855f9567 YMark: code clean up and some small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 21:19:42 +00:00
apfelmaennchen
667e912b19 YMark:
- some improvements to firefox json bookmark importer
- test import with: /api/ymarks/test_import.html
- view ymarks with: /api/ymarks/test_treeview.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 09:09:33 +00:00
apfelmaennchen
a0e4960a4d YMark:
- first attempt for a firefox json bookmark importer
- added JSON library json-simple-1.1.jar

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 20:58:58 +00:00
orbiter
958ff4778e enhanced location search:
search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found.
This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed.

Added also location parsing from wikimedia dumps. A wikipedia dump can now also be a source for a location search.
Fixed many smaller bugs in connection with location search.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 15:54:19 +00:00
orbiter
19fd13d3bc Added federated index storage to solr.
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.

The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.

federated solr storage is switched off by default.

To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/

Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
   that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
   want to use solr instead of YaCy.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-14 20:05:04 +00:00
orbiter
c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 22:04:23 +00:00
orbiter
01690eab86 fix for mediawiki importer and wikicode parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 13:22:27 +00:00
orbiter
c5352e6872 added new SearchResult class (to be used later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 06:16:31 +00:00
orbiter
4c013d9088 more UTF8 getBytes() performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-12 05:02:36 +00:00
apfelmaennchen
78d6d6ca06 refactoring for ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-08 21:15:10 +00:00
orbiter
a47bdc405b better logging for robinson selection according to peer tag
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7645 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-05 08:04:25 +00:00
orbiter
cafcb1f9ed removed the DNS resolving for web structure computation from the indexing queue and placed it in a concurrent computation queue that does not block the crawler. Makes crawling faster and less DNS-speed-dependent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-04 22:01:07 +00:00
orbiter
17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-04 12:20:20 +00:00
orbiter
96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-04 09:47:18 +00:00
orbiter
b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
(directly or indirectly) and it grants a crawl-delay of 0. Then all forced pause mechanisms in YaCy are switched off and the domain is crawled at full speed.
crawl delay values can be assigned to either
- all yacy peers using the user-agent yacybot
- a specific peer with peer name <peer-name>.yacy or
- a specific peer with peer hash <peer-hash>.yacyh


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-03 23:39:45 +00:00
orbiter
cb6f709a16 - enhancements in surrogate reading
- better display of map in location search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-02 00:11:37 +00:00
low012
1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
*) cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-01 23:32:40 +00:00
orbiter
156cf02703 - added an index constraint 'has location' to the condenser
- added evaluation of the 'has location' constraint to search using the /location operator


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-31 09:41:30 +00:00
orbiter
0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
- added parser for in-text appearing geo-locations
- added geo-locations to rss search result
- added evaluation of metadata-attached geo-locations in yacysearch_location to show search results within a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 23:26:36 +00:00
orbiter
9b25d07295 - added geo information parsing to html parser
- extended metadata information in index with geolocalisation
- added display of location in yacydoc and ViewFile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 00:49:47 +00:00
f1ori
efcf37a953 * show info in log, if robots.txt is rejected due to wrong mime-type
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-28 19:55:15 +00:00
low012
16cd919795 *) fixed Exceptions which caused 500 error when entering invalid URL mask or invalid prefer mask, invalid masks are ignored, error message is displayed on yacysearch.html (what about yacysearch.rss and yacysearch.json?)
*) fixed "more options" link on yacysearch.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 00:48:19 +00:00
low012
1a24917cea *) fixed NPE which occured when empty String was entered as search word
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7622 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 00:44:38 +00:00
orbiter
b1a8d0c020 enhancements to web cache and less strict caching rules
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 10:35:26 +00:00
orbiter
f3baaca920 - enhancements to DNS IP caching and crawler speed
- bugfixes (NPEs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 09:34:10 +00:00
low012
e7860b1239 *) <mode="Homer">D'oh!</Homer>
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 22:23:20 +00:00
low012
82f1580a60 *) trying to fix ConcurrentModificationException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 22:20:19 +00:00
low012
9f0286b380 *) fixed potential "java.lang.IllegalArgumentException: Illegal group reference" which occured if special characters which are also used as metacharacters in regular expression were used inside of <pre>...</pre> (see: http://veerasundar.com/blog/2010/01/java-lang-illegalargumentexception-illegal-group-reference-in-string-replaceall/)
The class still contains a potential ConcurrentModificationException which occurs when the List which contains the elements of the table of content is moified during a recursion of tagReplace(). Will try to fix this later today.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 18:02:09 +00:00
orbiter
78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
This change should affect filtering and navigators and should cause that search navigation gets faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 13:05:51 +00:00
orbiter
ba03ca8620 added more configuration options for search:
- removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal
- added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal
- added configuration of navigation options to ConfigPortal
- added an option to switch off automatic index cleaning in case that a link verification method fails


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 07:50:34 +00:00
f1ori
e0c7d490f9 * fix bug #6
* exclude signature files from auto-deletion of unknown files in DATA/RELEASE


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-20 17:59:58 +00:00
orbiter
a50f28e6e7 - fixed missing save operation for peer name change
- fixed import of mediawiki dump files
- added script to add mediawiki dump files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-19 23:52:09 +00:00
orbiter
2b5f8585bf performance hack for Balancer and ip address parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7608 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-17 21:09:18 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
orbiter
1989ebc24b removed more warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 22:52:30 +00:00
orbiter
b62b79675b removed type cast warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 21:08:18 +00:00
orbiter
8f11d3a5bb redesigned the ScoreMap classes:
- new concurrent score map using atom operation from java concurrency classes
- redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap
- switched from DynamicScore to ConcurrentScoreMap usage wherever possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-13 01:41:44 +00:00
orbiter
a564230c48 more enhancements against blocked threads occurred in seed age evaluation (blocks httpd in some cases)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7585 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-12 22:54:41 +00:00
orbiter
dc0db3550e avoid string conversion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-11 00:59:27 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
orbiter
30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7580 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 12:35:32 +00:00
orbiter
1214615185 fix for 'invisible entry', see http://forum.yacy-websuche.de/viewtopic.php?p=22133#p22133
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 17:04:34 +00:00
orbiter
3820525464 more memory protection: auto-flush of caches in case of memory shortage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 16:32:34 +00:00
orbiter
7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
1) if the file is changed for a re-crawl this is not reflected in the steering because it would take the previously uploaded crawl start file
2) browsers do not submit the full path of the selected file even if this path is shown in the input field because of security reasons. There is no work-around or hack to make the submission of the full path possible

- fixed deletion of crawl start point urls in crawl stack and balancer double-check
- fixed a problem with steering self-call (no resolving of localhost)
- added more logging for the crawler to supervise why crawl urls are not taken by the loader
- added a javascript onload-function to select domain restriction in all cases where a crawl is started from a file or from a url
- fixed the restrict-to-domain pattern computation, added a 'www.'-prefix and added this functionality also to a crawl start from file 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 12:50:39 +00:00
orbiter
e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 09:29:05 +00:00
low012
3b40b98256 *) set SVN properties
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 01:51:51 +00:00
orbiter
2af8e33773 better performance computing search targets with index abstracts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 23:32:01 +00:00
orbiter
619b561a4a enhanced secondary search: index abstracts decompression is now much faster and does not cause strong CPU load after several searches with more than one word
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 23:12:39 +00:00
orbiter
27ecdb5444 use less peers for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7561 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 21:24:46 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
7138f4036b less synchronization, better thread dump tool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 15:29:45 +00:00
orbiter
8d14916c74 more patches for a better out-of-memory management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 01:45:11 +00:00
orbiter
c2c5b12882 - even less memory for circle tool
- background thread for bookmark initialization: this uses a DNS lookup which may cause long waiting times during startup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7554 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-06 12:30:22 +00:00
orbiter
799c534935 one more patch again OOM during secondary remote search
see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3202

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-05 19:52:34 +00:00
orbiter
77b1e921a9 this asserts prevents a network operation in case of sabotage and must be removed therefore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7550 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 14:34:37 +00:00
orbiter
bed79402be introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured.
This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition.

You can find now in the network definition file:
network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime
and in the yacy.conf file:
remotesearch.maxcount and remotesearch.maxtime

There is currently no web interface to define the client-side remote search attributes, please set them manually
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 13:44:00 +00:00
orbiter
6dfaf6fef7 fix for bug in deletion of old seeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 10:00:37 +00:00
orbiter
993b9bc1a8 memory/performance hacks, less synchronization, better concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-03 11:30:04 +00:00
orbiter
42d90664f3 - fixed a memory leak in the httpc.post method (no finish)
- patched some more memory-saving relevant code
- some more minor bug fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-01 09:03:33 +00:00
orbiter
38dce547c0 better concurrency (less locking on date formatting) more logging and minor bug fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-28 06:28:29 +00:00
f1ori
59dea3a284 * implement url proxy, a proxy via the url http://peer:port/proxy.html?url=http://domain.tld/path
* enable with proxyURL = true
* could be useful to browse specific pages with proxy or use own improvements in proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7538 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 21:39:38 +00:00
mikeworks
8b7b783c49 Tray.java: Broke the build on with wrong non UTF-8 encoded file and french umlauts (unmappable character for encoding UTF8)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7537 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 15:01:46 +00:00
mikeworks
db65ada467 Tray.java: Added localization for french tray icon command - although this can probably also done better than with if statements. (preferably also from the locales file)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 11:42:33 +00:00
orbiter
89d337841c more logging for OOMs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7534 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-26 09:27:15 +00:00
orbiter
b1781d7aae some more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7533 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-26 01:24:49 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
orbiter
dec24244cf added convenience class to generate UTF StringBody objects with a default UTF8 charset.
Reason: if this is not used in StringBody-Class initialization, a default charset name is parsed.
This is a synchronized process and all classes using default charsets synchronize at that point
Synchronization is omitted if this class is used
 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7530 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 13:26:09 +00:00
orbiter
1110d16af9 performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7529 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 12:41:27 +00:00
orbiter
19b2a50578 - enhanced date formatter cache
- added more instances of formatter objects to different classes to make them independent in case of lockings that may applay during synchronization of the date formatter object (date formatting is not thread-safe and must be synchronized therefore)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7528 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 12:23:00 +00:00
orbiter
f2e8ffd768 enhancement in synchronisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7525 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 01:19:39 +00:00
orbiter
ad7fcb9d61 Enhanced Base64Order transformation: less overhead (transformation between StringBuilder and byte[])
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 00:56:31 +00:00
orbiter
0ce17d823a - fixed bug in ordering
- fixed ConcurrentModificationException in set join


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-24 10:32:46 +00:00
orbiter
dec4f36700 - fix for missing favicons in search widgets
- fix for bad digest/hash computation in case of interrupts to class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7518 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-23 23:04:05 +00:00
orbiter
e3ef4e3021 - increased default peer ping time from 2 minutes to 1 minute
- filtering out too old peers when reading seed lists (limit is now 240 minutes)
- added concurrent host names resolving in front of the http client because the http client uses the java built-in DNS resolve which is not multithreading-safe (i have seen deadlocks in thread dumps showing that this bug in jdk is still there)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7515 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-23 09:42:01 +00:00
orbiter
cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 22:58:19 +00:00
orbiter
d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 13:33:28 +00:00
orbiter
af87af0d4c - removed synchronization in serverSwitch which should improve speed
- fixed wrong assert in network graph
- enhanced double check method in table class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 12:56:25 +00:00
orbiter
4bd65532da initialization of libraries concurrently (faster start-up)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 10:57:38 +00:00
orbiter
57e6728cb7 - removed usage of /etc/alternatives/www-browser because of problems with lynx, see:
http://forum.yacy-websuche.de/viewtopic.php?p=21959#p21959
  please look if the browser that is linked with /etc/alternatives/www-browser can be detected and insert call again if
  it can be made sure that this does not call lynx
- replaced severe warnings with just warnings in yacyClient

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 11:53:45 +00:00
orbiter
d84b4a072e healing for some OOM problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 00:38:49 +00:00
orbiter
82f262f685 - enhanced circle drawing speed
- beautified 'moving dot' feature (using smaller and correctly positioned dots)
- added moving dots to DHT transfer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7500 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 00:03:11 +00:00
orbiter
29dc416ac6 more animations in graphics. See network and access picture.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-19 01:56:12 +00:00
orbiter
a80ee9a03d THE GRID is coming to YaCy .. see new animated graphics on http://localhost:8090/AccessGrid_p.html
showing incoming and outgoing connections in an animated way

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-18 23:19:35 +00:00
low012
ce012e11aa *) deleted LogStatistics since the page did not work anymore and it seemed to be obsolete, tell me if you miss it and I will add it again
*) a few minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-18 01:46:07 +00:00
low012
c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
*)  cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-18 01:25:46 +00:00
orbiter
d58071947a maybe terminateOldSessions is too slow, removed sleep
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-17 23:36:46 +00:00
orbiter
6c52e31993 new methods to open a browser
- if YaCy is started with the option -gui, it is not in headless mode. Then the java 1.6 browse method is used if all other methods fail
- in linux, the path /etc/alternatives/www-browser is used if no firefox is installed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-14 16:15:14 +00:00
orbiter
5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-13 17:37:28 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
low012
64f32e8f00 *) replaced all IPs in IP filters for proxy with the proper regular expression
*) some cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-11 23:37:13 +00:00
orbiter
93732d6773 increased number of target peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-04 13:25:28 +00:00
orbiter
70ca7cec8c fix for http://forum.yacy-websuche.de/viewtopic.php?p=21763#p21763
and another fix for non-working global search when search options are switched off

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-03 10:43:09 +00:00
orbiter
fe93caac5a added flags and administration options to show advanced search and to show search result attributes (for each search result)
Administration can be done at ConfigPortal.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:54:13 +00:00
orbiter
5905f912c5 replaced more double types with float
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7462 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:22:00 +00:00
orbiter
0cdfb82963 replaced more appearance of double values by float values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:06:29 +00:00
orbiter
eb12e15738 moved all Double values to Float values because of
http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/
YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-01 23:49:11 +00:00
f1ori
982aa689ef * fix StringIndexOutOfBoundException in WebStructureGraph
* add better escaping to saveMap and loadMap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-31 14:25:09 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
orbiter
28f669bf0b - fixed/enhanced move to SD/16:9 images (network, web structure)
- added logging in peer ping to analyse time-consuming elements which could be cause for disappearing peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7450 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 10:16:14 +00:00
orbiter
0376f73fdb extended seed list uploader: do not only upload all active peers but also some more peers that are passive but had been active in the last 24 hours
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7449 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 23:21:33 +00:00
orbiter
991b92f4ae enhanced network graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 13:52:46 +00:00
orbiter
3ae8f40fc8 removed yacy.network.group - this feature was never used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:50:36 +00:00
orbiter
efb4ca8fa8 modified auto-delete of search failure-words:
- words are now not deleted from the search index automatically if index receive is switched off
- a flag in the network definition defines if this feature is switched on at all
- the search filter for not-found word references is switched off for server-side remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:46:00 +00:00
orbiter
f1f03d8c90 more logging for strange network loading bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-19 09:31:56 +00:00
f1ori
4e29e9712a * create cleanupjob for cached failed urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 15:04:00 +00:00
f1ori
a321c7673d * adminAccountForLocalhost only for localhost
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 11:37:30 +00:00
low012
48463c4507 *) General private License? ;-)
*) minor code changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:28:08 +00:00
orbiter
c93f4dda72 - cleaned up yacy news
- removed unused methods
- avoid news generation in case that the peer runs in robinson mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:00:14 +00:00
orbiter
6c1b14c8e1 - more control in access tracker: count number of returned search results (not only info how much is in the index)
- extended query params for this
- enhanced cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-11 22:58:14 +00:00
low012
9f38c0023d *) Minor changes, mainly cleaning up a little bit, no functional changes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 20:24:52 +00:00
orbiter
54e77e6255 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 08:40:41 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
lotus
0e54233408 UPnP: map port again if we are not reachable (e.g. when router rebooted)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 21:17:21 +00:00
lotus
b1484299b2 same units for memory observer configuration (MiB)
old setting for DHT (RAM) will be lost after update
can be set on /Performance_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 20:38:01 +00:00
orbiter
89ae6101b9 fix for NPE and added comment in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 14:51:07 +00:00
orbiter
0769f4caa6 added search suggestions for interactive search: is only shown if there are no search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 14:30:25 +00:00
orbiter
a4c9d27287 - moved some variables from Stwitchboard to new class AccessTracker
- added a limitation in access tracking to delete queries which are older than 10 minutes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 01:54:27 +00:00
f1ori
e4aabaa1c3 * fix negative filelength for files >2G
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 17:25:39 +00:00
orbiter
cdfe8afe3f fix for really bad table iteration implementation: reduction of IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 16:44:55 +00:00
f1ori
ee3cef91e8 * fix filesize in ftp crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 02:15:22 +00:00
orbiter
b2ed4cfaf8 more small bugfixes and light refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 01:57:05 +00:00
low012
3d95981f7d *) cleaning up the code a little bit
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 17:07:21 +00:00
orbiter
6b70393d1d - new java version 1.6
- replaced old gif animator by java 1.6 gif animator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-20 22:51:50 +00:00
orbiter
e88c428008 fix to ftp loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-18 10:22:54 +00:00