Commit Graph

4615 Commits

Author SHA1 Message Date
sixcooler
2cf61a40ce fixed a bug from 7856, where Snippet returned an error by mistake when Metadata was found
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-02 16:50:05 +00:00
orbiter
610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
- some refactoring for mime type discovery

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 16:05:00 +00:00
orbiter
3da21c4266 protection against starting of a (second) yacy peer while another one is already running on the same port
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 13:13:21 +00:00
orbiter
3e6767d66c limitation of reference evaluation (protection against crawler pits)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 21:12:31 +00:00
orbiter
2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 10:35:25 +00:00
orbiter
9f9f634de2 fix in search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:12:48 +00:00
sixcooler
5f8a5ca32d - not doing merge-jobs while short on Memory
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:07:53 +00:00
orbiter
22d69a6368 refactoring in cora: added sorting package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:18:30 +00:00
orbiter
51cf697acd refactoring: moved all score-related classes to new ranking package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:37:53 +00:00
sixcooler
169236c6d9 almost revert changes in this class of 7880 and 7882
since MemoryControl does handle negative value requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:58:23 +00:00
sixcooler
4fec99115b Implementation of strategies for controlling memory resources.
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:50:03 +00:00
orbiter
c64faf41e2 addon to svn 7880
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-15 11:07:03 +00:00
sixcooler
06408a9428 since many POST-requests come as gzip they report a contentlength of -1
request memory of -1 * 3 look useless to me
so I added some megs to it - even correct report of contentlength should not be harmed by this

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-13 01:04:37 +00:00
orbiter
594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:37:35 +00:00
sixcooler
eb14111200 encapsulate potential expensive objects in TextSnippet to allow GC them asap
this reduces chance of OOMs at massive search & snippet-fetching

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:07:52 +00:00
orbiter
e3fc1efbef performance hack and ensuring termination in serverAccessTracker. cause:
"Session_:53600#0_POST /yacy/hello.html HTTP/1.1" prio=10 tid=0x2322b000 nid=0x3ba7 runnable [0x03d3e000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Long.valueOf(Long.java:557)
        at de.anomic.server.serverAccessTracker.clearTooOldAccess(serverAccessTracker.java:113)
        at de.anomic.server.serverAccessTracker.cleanupAccessTracker(serverAccessTracker.java:75)
        - locked <0x3bda2ae8> (a de.anomic.server.serverAccessTracker)
        at de.anomic.server.serverAccessTracker.track(serverAccessTracker.java:125)
        at de.anomic.server.serverSwitch.track(serverSwitch.java:542)
        at de.anomic.http.server.HTTPDemon.parseRequestLine(HTTPDemon.java:641)
        at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:491)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at de.anomic.server.serverCore$Session.listen(serverCore.java:757)
        at de.anomic.server.serverCore$Session.run(serverCore.java:651)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-03 18:47:43 +00:00
orbiter
44d74f8f89 performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-03 18:32:11 +00:00
sixcooler
5cd07d7f84 early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
doing same thingy on other methods of touched files as well

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-02 15:52:33 +00:00
sixcooler
a311596881 finishing up my commits (7855-7858) which could be helpful for
not declaring inside loops (helps GC of some VMs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:35:24 +00:00
sixcooler
c0caca57e3 stoping thread for fetching searchresults if running short on memory
- in most cases at least one thread stays alive for getting the results
- fewer threads should do the work with less resouces, but much slower then

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:32:29 +00:00
sixcooler
ce248cc8dd less byte-arrays of response-content, less byte-array <-> stream conversation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:31:08 +00:00
sixcooler
59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:28:23 +00:00
f1ori
3a5fa73008 * revert parts of previous commit, because it breaks the trickle-feature
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 12:04:40 +00:00
f1ori
6e79675ff3 * use gzip-encoding in more cases
* send Expire-Header for static content
* should improve webserver-performance for slow connections
* fixes #37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 11:47:53 +00:00
sixcooler
aff875baef smaler ping-entry @ ProfilingGraph
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-15 09:14:21 +00:00
orbiter
1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-15 08:38:10 +00:00
orbiter
be15874be1 added request line in http which can support better debugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 11:00:38 +00:00
orbiter
11dc653de3 added a visualization of peer pings to the performance graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 07:07:06 +00:00
orbiter
3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 03:25:57 +00:00
cominch
09bb7a390c do not replace malformed or invalid URLs in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-12 07:44:23 +00:00
orbiter
768c59740c - replaced solrj 3.1 with solrj 3.3
- updated also slf4j
- added authentication for solrj


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:35:30 +00:00
low012
c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 23:55:55 +00:00
orbiter
719777b2a7 replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 18:13:37 +00:00
orbiter
2d4bb139d3 - added counting of links with noindex tag for solr index
- bugfixes for solr index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 06:40:05 +00:00
orbiter
892caccdca added default configuration in ConfigurationSet in case of new values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 00:09:49 +00:00
orbiter
bda3eec0ff added parsing of canonical link element to html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-01 16:38:01 +00:00
orbiter
b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:49:21 +00:00
f1ori
a17351dcfe * navigation bar for filetype constraints
javascript interpreted backslashes from urlmask as escaping and didn't forward them to yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:30:24 +00:00
f1ori
96957375cc * fix url proxy for relative links and chromium
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 09:32:02 +00:00
orbiter
9ebc75db4b fix for channel authorization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 23:14:02 +00:00
orbiter
6d9e5865ee faster appearance of search result page (but complete search time is the same)
this was inspired by http://bugs.yacy.net/view.php?id=37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:17:02 +00:00
orbiter
f7ca84cfc0 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:15:13 +00:00
orbiter
84c9658644 added a file type navigator
added a protocol navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 15:39:52 +00:00
orbiter
31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 11:57:17 +00:00
orbiter
4b425ffdd2 fix for http://bugs.yacy.net/view.php?id=41
added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-22 10:19:32 +00:00
orbiter
7db208c992 performance hacks: more pre-allocated StringBuilder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-21 23:10:50 +00:00
orbiter
87bd559c42 fixed warning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:53:43 +00:00
orbiter
f30d36b101 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:02:06 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
sixcooler
7bfa6bb4b6 prevent getting a yacySeed from zero-length-hash-string by chance
(for eg.: proxy-crawls got displayed as initiated by some other peer)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 22:58:17 +00:00