Commit Graph

68 Commits

Author SHA1 Message Date
sixcooler
59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:28:23 +00:00
sixcooler
aff875baef smaler ping-entry @ ProfilingGraph
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-15 09:14:21 +00:00
orbiter
11dc653de3 added a visualization of peer pings to the performance graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 07:07:06 +00:00
orbiter
3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 03:25:57 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
orbiter
bce280a308 update on options for interface graphics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 22:48:21 +00:00
orbiter
2683162ec5 - added more options to access grid picture, web structure picture and network graphics
- remove test class


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-02 23:27:26 +00:00
orbiter
a7a6b392f5 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 10:16:43 +00:00
orbiter
0e9a99cb05 another resource hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 07:51:18 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
10e2f588f8 - enhanced ybr ranking computation
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 10:57:02 +00:00
orbiter
bd55dcee50 - commented out experimental distributed ranking loading
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 21:08:01 +00:00
orbiter
b45701d20f this is a re-implementation of the YaCy Block Rank feature
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing

Please play around with the ranking settings and see if this helped to make search results better.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-18 14:26:28 +00:00
orbiter
021840e5ba removed (almost) deadlocks and unnecessary CPU load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-17 00:00:01 +00:00
orbiter
123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.

The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.

To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-15 22:57:31 +00:00
orbiter
b77b8cac0c - enhanced html parser: recognized much more details in the content
- added more properties to solr index
- refactoring
- more constants in switchboard
- fix for some NPEs
- recognition of more images
- removed synchronization in HandleMap (obviously not necessary?)
- added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-21 13:58:49 +00:00
orbiter
fd3baa9025 fix for http://bugs.yacy.net/view.php?id=24
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-17 22:37:04 +00:00
orbiter
cafcb1f9ed removed the DNS resolving for web structure computation from the indexing queue and placed it in a concurrent computation queue that does not block the crawler. Makes crawling faster and less DNS-speed-dependent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-04 22:01:07 +00:00
orbiter
b1a8d0c020 enhancements to web cache and less strict caching rules
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 10:35:26 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
low012
3b40b98256 *) set SVN properties
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 01:51:51 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
8d14916c74 more patches for a better out-of-memory management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 01:45:11 +00:00
orbiter
b1781d7aae some more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7533 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-26 01:24:49 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
orbiter
af87af0d4c - removed synchronization in serverSwitch which should improve speed
- fixed wrong assert in network graph
- enhanced double check method in table class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 12:56:25 +00:00
orbiter
82f262f685 - enhanced circle drawing speed
- beautified 'moving dot' feature (using smaller and correctly positioned dots)
- added moving dots to DHT transfer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7500 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 00:03:11 +00:00
orbiter
29dc416ac6 more animations in graphics. See network and access picture.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-19 01:56:12 +00:00
orbiter
a80ee9a03d THE GRID is coming to YaCy .. see new animated graphics on http://localhost:8090/AccessGrid_p.html
showing incoming and outgoing connections in an animated way

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-18 23:19:35 +00:00
f1ori
982aa689ef * fix StringIndexOutOfBoundException in WebStructureGraph
* add better escaping to saveMap and loadMap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-31 14:25:09 +00:00
orbiter
991b92f4ae enhanced network graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 13:52:46 +00:00
low012
9f38c0023d *) Minor changes, mainly cleaning up a little bit, no functional changes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 20:24:52 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
f1ori
9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 19:20:00 +00:00
orbiter
a9f754c45f removed unused CR accumulation and distribution process
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 11:07:42 +00:00
f1ori
7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-26 16:10:20 +00:00
orbiter
377f001e0d sorting of crawl profile names in crawl profile editor, see
http://forum.yacy-websuche.de/viewtopic.php?p=20851#p20851

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 09:09:38 +00:00
orbiter
83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before)
- added use of LookAheadIterator which should prevent mistakes when coding iterators with embedded iterators
- added a fail-safe reaction in case of database corruption using iterators over database elements (no interruption then)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 10:43:14 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
86d7f8a989 - the web visualization can now be generated in custom color
- added input fields in WatchWebStructure_p.html
- introduced enum classes for Draw Mode and Filter Mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 10:44:00 +00:00
orbiter
64d4204f44 fix for NPE in network image computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 08:18:17 +00:00
orbiter
b480b7a4d0 fix for bug in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 00:13:32 +00:00
orbiter
b12bfe1f91 better usage of OSM tile cache and YaCy cache by usage of better tile server computation based on a coordinate hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-08 23:51:37 +00:00
orbiter
388aa021c2 - concurrent loading of OSM tiles
- added  a 4-time re-try in case that tile server does not respond

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-08 23:14:08 +00:00
orbiter
610855e362 do not use network graph cache if called from authorized account
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:43:15 +00:00
orbiter
e7ea3b3cc5 added a buffer for network images to reduced load on yacy.net network image server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 12:45:53 +00:00
orbiter
d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 11:40:58 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00