Commit Graph

5556 Commits

Author SHA1 Message Date
low012
f1244264b8 *) hopefully fixed bug reported in http://forum.yacy-websuche.de/viewtopic.php?t=2057
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-26 16:18:14 +00:00
lotus
2714ff034b avoid undefined in rssTerminal
thanks to freq.9! http://forum.yacy-websuche.de/viewtopic.php?p=14288

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-26 11:45:36 +00:00
orbiter
2e3186189b fix for mediawikiIndex surrogate producer + added concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:52:21 +00:00
apfelmaennchen
6f5ea7b1a8 small fix for previous post
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:28:08 +00:00
apfelmaennchen
2eabd989ce - added a log viewer to RichClient (alpha version, very slow)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 20:58:56 +00:00
apfelmaennchen
138a0747e3 added serverObjects.putJSON as JSON has very particulare encoding requirements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 20:56:29 +00:00
apfelmaennchen
557c2a32a3 small fix for yacyui-portalsearch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 16:59:35 +00:00
apfelmaennchen
b4539a61dd some more documentation for yacyui-portaltest.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 15:14:10 +00:00
orbiter
64a63306b8 added portal-test explanation page to the customization submenu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 13:18:13 +00:00
apfelmaennchen
64ce9da60f - new yconf parameter global
- see http://forum.yacy-websuche.de/posting.php?mode=quote&f=9&p=14207#pr14207

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 13:08:07 +00:00
apfelmaennchen
5ca306da9a fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2054#p14306
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 12:24:40 +00:00
apfelmaennchen
9325198c42 hopefully a fix for http://forum.yacy-websuche.de/viewtopic.php?f=15&t=1762#p14305
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 08:15:27 +00:00
orbiter
d977dd9a96 fix for surrogate loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 22:54:40 +00:00
apfelmaennchen
675f350d18 YaCy Portal Search Widget
- see http://localhost:8080/yacy/ui/yacyui-portaltest.html
- two new parameters (logo and link) for yconf as requested at http://forum.yacy-websuche.de/viewtopic.php?f=15&t=1762#p14101


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 17:21:31 +00:00
orbiter
9cb68353da fix for bug in ProfilingGraph for ppm >> 10000 ppm (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 13:18:20 +00:00
orbiter
9e4db75aac reduced internal logging and reduced memory that internal logging can use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5867 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 12:09:04 +00:00
orbiter
c10c257255 attempt to fix a deadlock situation where the IODispatcher did not work.
I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data.
This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted.
Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 11:55:39 +00:00
orbiter
09987e93fd fixed some more bad handling of byte[]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 22:02:12 +00:00
orbiter
1bcc1450cb more explaining error message in case of IOExceptions during html parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 21:18:01 +00:00
orbiter
fe51f4d668 less synchronization may help to prevent deadlocks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 20:54:13 +00:00
orbiter
58802e4201 added missing success test in storeDocumentIndex,
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1922&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 20:38:56 +00:00
lotus
fbca4f8354 more stability on watchcrawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 18:42:15 +00:00
orbiter
171e62bee5 addition to the fix from last commit (which did not work)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 16:36:21 +00:00
orbiter
059949a0d1 tried to fix problem with snippet fetch for second search page when verify=false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 15:29:30 +00:00
lotus
b08991e278 moved some constants, rename of Tray class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 13:18:59 +00:00
orbiter
54773ad4d4 added release keys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 22:46:42 +00:00
orbiter
138422990a - removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
- added some debugging output to balancer to find a bug
- removed unused classes for index collection handling
- changed some default values for the process handling: more memory needed to prevent OOM

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 22:39:12 +00:00
orbiter
1b9e532c87 some concurrency for wikipedia dump reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 17:43:27 +00:00
orbiter
dec495ac78 added dummy class for help page
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2033&hilit=&p=14107#p14107

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 13:59:20 +00:00
lotus
25d2160288 small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 13:19:37 +00:00
lotus
daea87d436 do not accept dht from bad versions
delete bad hashes on receive

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 12:13:37 +00:00
orbiter
16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
orbiter
0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 15:19:32 +00:00
orbiter
5195c94838 two patches for performance enhancements of the index handover process from documents to the index cache:
- one word prototype is generated for each document, that is re-used when a specific word is stored.
- the index cache uses now ByteArray objects to reference to the RWI instead of byte[]. This enhances access to the the map that stores the cache. To dump the cache to the FS, the content must be sorted, but sorting takes less time than maintenance of a sorted map during caching.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 14:23:04 +00:00
lulabad
06c878ed11 moved update_key to correct position in file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5848 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 10:15:12 +00:00
orbiter
9416f5c26f more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes:
just start IndexTest (here with 1000000 test objects)

Performance test: comparing HashMap, TreeMap and kelondroRow
generated 1000000 test data entries 


STANDARD JAVA CLASS MAPS 

sorted map
time   for TreeMap<byte[]> generation: 2110
time   for TreeMap<byte[]> test: 2516, 0 bugs
memory for TreeMap<byte[]>: 29 MB

unsorted map
time   for HashMap<String> generation: 1157
time   for HashMap<String> test: 1516, 0 bugs
memory for HashMap<String>: 61 MB


KELONDRO-ENHANCED MAPS 

sorted map
time   for kelondroMap<byte[]> generation: 1781
time   for kelondroMap<byte[]> test: 2452, 0 bugs
memory for kelondroMap<byte[]>: 15 MB

unsorted map
time   for HashMap<ByteArray> generation: 828
time   for HashMap<ByteArray> test: 953, 0 bugs
memory for HashMap<ByteArray>: 9 MB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 09:29:08 +00:00
orbiter
b53790abb1 more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 07:39:21 +00:00
orbiter
8ffb9889e1 some fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 23:01:44 +00:00
orbiter
dfb96ecb72 more fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 22:08:38 +00:00
orbiter
1b8d346b4c fixes in connection with transiton to byte[] hashes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 21:54:00 +00:00
f1ori
0b0a46d35a * fix transferRWI as suggested by celle (thanks!)
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 19:51:20 +00:00
orbiter
996572de95 quickfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 16:11:35 +00:00
orbiter
380ed2dac0 performance and debugging additions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 15:01:43 +00:00
lotus
635b0a9da7 code-split
allow cgi indexing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 13:28:28 +00:00
orbiter
e7559f3234 fix for http://forum.yacy-websuche.de/viewtopic.php?p=13977#p13977
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 10:06:55 +00:00
orbiter
fa3adbbfc6 added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 06:38:28 +00:00
f1ori
76af84d732 * add custom comparator to ScoreCluster for byte[]
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-19 20:01:46 +00:00
low012
31c6934df2 *) fix for r5832
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-19 07:40:23 +00:00
lotus
ab0030d7a7 allow dht-out for remote-crawl processing peers on default settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-18 20:04:01 +00:00
lotus
616a4d724f high-end favicon with 2 versions:
* true color + alpha channel for modern browsers
* 256 colors and non-transparent background for others

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5833 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-18 18:37:26 +00:00