Commit Graph

7577 Commits

Author SHA1 Message Date
orbiter
6db8921a0f enhanced termlist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:23:22 +00:00
orbiter
b5252ef91f added new word recommendation library in DictionaryLoader_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:14:17 +00:00
orbiter
1c007188ad bugfixes in html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-31 16:02:06 +00:00
orbiter
b00e69c5df removed test output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-29 09:16:41 +00:00
orbiter
231074bf0a fixed a parsing bug by reverting SVN 7766
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 22:59:19 +00:00
low012
ce11b7b6d2 *) Changed action to "" instead of "yacysearch.html". This should not do any harm, but helps a lot if the page is accessed not by its original name but by a different name which can be done by adding a symbolic link to the file system of the peer. (See http://www.yacy-forum.org/viewtopic.php?f=2&t=464)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 14:24:09 +00:00
low012
30a8a2f76b *) replacing one ugly hack with an extended ugly hack ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 13:32:42 +00:00
low012
95379ce0b1 *) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 12:59:11 +00:00
low012
c660f8862a *) changed links to be underlined again since lots of links were not obvious anymore
*) added SVN properties

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 01:14:58 +00:00
low012
24e76a7b69 *) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
*) Added description of where to place MediaWiki dump for import.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 00:16:36 +00:00
sixcooler
d40a177c05 Generation Memory Strategy fine tuning
add some log-output in termlist_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-27 15:23:24 +00:00
sixcooler
839f407fe4 Generation Memory Strategy fine tuning:
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 22:32:15 +00:00
orbiter
3e6767d66c limitation of reference evaluation (protection against crawler pits)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 21:12:31 +00:00
orbiter
a5541751a8 - added memory computation to termlist_p.xml
- added option to delete terms in termlist_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 19:13:45 +00:00
orbiter
45e497a9bd fix for term iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 18:29:30 +00:00
orbiter
9bdee5c71c added a servlet that produces a list of term hashes that appear more than 10000 times
see /api/termlist_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7898 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 16:49:20 +00:00
orbiter
5dd2efc9a2 - bugfixes in html parser
- new fields in solr
- extended file viewer to debug parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 15:52:25 +00:00
orbiter
2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 10:35:25 +00:00
orbiter
75df87832c refactoring/better naming of methods and classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 23:08:28 +00:00
orbiter
9f9f634de2 fix in search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:12:48 +00:00
sixcooler
5f8a5ca32d - not doing merge-jobs while short on Memory
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:07:53 +00:00
orbiter
965fabfb87 enhanced sorting speed (affects all DB operations)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 10:27:19 +00:00
orbiter
41a8ee4569 added iterable implementation in KeyList
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:23:40 +00:00
orbiter
22d69a6368 refactoring in cora: added sorting package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:18:30 +00:00
orbiter
51cf697acd refactoring: moved all score-related classes to new ranking package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:37:53 +00:00
orbiter
a0d5e7b6e6 added new score comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:33:08 +00:00
sixcooler
169236c6d9 almost revert changes in this class of 7880 and 7882
since MemoryControl does handle negative value requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:58:23 +00:00
sixcooler
4fec99115b Implementation of strategies for controlling memory resources.
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:50:03 +00:00
sixcooler
63a375b801 do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:45:27 +00:00
orbiter
c39d63e7ad by default show only domain navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7884 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 10:19:15 +00:00
orbiter
2c58af6874 - added a short memory status simulation mode
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-17 22:24:17 +00:00
orbiter
c64faf41e2 addon to svn 7880
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-15 11:07:03 +00:00
sixcooler
7b7a196243 ignore cookies in httpclient per default
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-14 12:28:28 +00:00
sixcooler
06408a9428 since many POST-requests come as gzip they report a contentlength of -1
request memory of -1 * 3 look useless to me
so I added some megs to it - even correct report of contentlength should not be harmed by this

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-13 01:04:37 +00:00
sixcooler
411ed159f8 do some extra sleep while running low on memory
(1 sec. per outofmemoryCycle)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-13 00:59:59 +00:00
orbiter
6361f1d875 select the search window on focus so its easy to type in another query
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 18:07:46 +00:00
sixcooler
9ab0ba41e2 using GzipDecompressingEntity from httpclient instead of our own
(was just fixed there in httpclient-4.1.2 and does a proper job)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:51:30 +00:00
sixcooler
52b477cf6f bump to httpclient-4.1.2, httpcore-4.1.3 - bugfixrelease
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:42:32 +00:00
orbiter
ca09081341 better interaction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:13:34 +00:00
orbiter
3f0349e362 added a 'loading...' message
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 16:09:40 +00:00
orbiter
feac494f26 switch off real-time search if index is large
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 14:42:20 +00:00
sixcooler
07f5954570 try better handling of corrupt blobs
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 13:27:11 +00:00
orbiter
f970670a7c - bugfix in ServerScannerList
- speed up of generation of scanner list avoiding forced dns lookup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 13:21:18 +00:00
orbiter
8e03b8ee8b better integration of server list in interactive search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 12:25:45 +00:00
orbiter
606c5a9b40 added a serlvet that shows all scanned servers inside of the yacyinteractive search page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 10:31:01 +00:00
orbiter
0a3ab7da1b do not sort concrrently the same array
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 08:06:21 +00:00
orbiter
77a9af99f1 same values for Xmx and Xms: memory extension may be difficult if the OS has not the remaining memory available and may kill the jvm. If the memory is reserved at the start but never used the OS may handle that as well and leave non-used space in swap area (and never swap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7867 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:54:27 +00:00
orbiter
594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:37:35 +00:00
sixcooler
eb14111200 encapsulate potential expensive objects in TextSnippet to allow GC them asap
this reduces chance of OOMs at massive search & snippet-fetching

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:07:52 +00:00
cominch
3aa6528ed0 the form value was not correctly interpreted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 07:31:35 +00:00