Commit Graph

790 Commits

Author SHA1 Message Date
sixcooler
ecb4986b38 refactored stuff from last commit to ReferenceContainer
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 18:55:16 +00:00
sixcooler
f7c4abfdd7 limit references per blob & term to the 100.000 youngest
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 13:08:06 +00:00
orbiter
28f5b79deb added a fast mass-deletion method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 11:42:06 +00:00
orbiter
a70dbce41c added another file tool class to yacy-cora
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:09:35 +00:00
orbiter
49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:08:57 +00:00
orbiter
e02bfbde56 fix for solr url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 09:07:40 +00:00
orbiter
580beb12a5 reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-05 00:26:27 +00:00
orbiter
44d6416e2d ensure termination of shrink()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-05 00:22:21 +00:00
orbiter
52230a6864 replaced catching of Exception with Throwable, which catches also Errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-05 00:09:48 +00:00
orbiter
877eaf6bcb switched off logging of org.apache.http which was suddenly switched on by default (??)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-04 23:28:15 +00:00
orbiter
e1a3d609aa moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-04 23:27:12 +00:00
orbiter
610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
- some refactoring for mime type discovery

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 16:05:00 +00:00
orbiter
3da21c4266 protection against starting of a (second) yacy peer while another one is already running on the same port
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 13:13:21 +00:00
orbiter
b5252ef91f added new word recommendation library in DictionaryLoader_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:14:17 +00:00
orbiter
1c007188ad bugfixes in html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-31 16:02:06 +00:00
orbiter
231074bf0a fixed a parsing bug by reverting SVN 7766
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 22:59:19 +00:00
low012
30a8a2f76b *) replacing one ugly hack with an extended ugly hack ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 13:32:42 +00:00
low012
95379ce0b1 *) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 12:59:11 +00:00
low012
24e76a7b69 *) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
*) Added description of where to place MediaWiki dump for import.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 00:16:36 +00:00
sixcooler
d40a177c05 Generation Memory Strategy fine tuning
add some log-output in termlist_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-27 15:23:24 +00:00
sixcooler
839f407fe4 Generation Memory Strategy fine tuning:
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 22:32:15 +00:00
orbiter
a5541751a8 - added memory computation to termlist_p.xml
- added option to delete terms in termlist_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 19:13:45 +00:00
orbiter
45e497a9bd fix for term iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 18:29:30 +00:00
orbiter
5dd2efc9a2 - bugfixes in html parser
- new fields in solr
- extended file viewer to debug parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 15:52:25 +00:00
orbiter
2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 10:35:25 +00:00
orbiter
75df87832c refactoring/better naming of methods and classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 23:08:28 +00:00
sixcooler
5f8a5ca32d - not doing merge-jobs while short on Memory
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:07:53 +00:00
orbiter
965fabfb87 enhanced sorting speed (affects all DB operations)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 10:27:19 +00:00
orbiter
41a8ee4569 added iterable implementation in KeyList
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:23:40 +00:00
orbiter
22d69a6368 refactoring in cora: added sorting package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:18:30 +00:00
orbiter
51cf697acd refactoring: moved all score-related classes to new ranking package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:37:53 +00:00
orbiter
a0d5e7b6e6 added new score comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:33:08 +00:00
sixcooler
4fec99115b Implementation of strategies for controlling memory resources.
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:50:03 +00:00
sixcooler
63a375b801 do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:45:27 +00:00
orbiter
2c58af6874 - added a short memory status simulation mode
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-17 22:24:17 +00:00
orbiter
c64faf41e2 addon to svn 7880
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-15 11:07:03 +00:00
sixcooler
7b7a196243 ignore cookies in httpclient per default
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-14 12:28:28 +00:00
sixcooler
411ed159f8 do some extra sleep while running low on memory
(1 sec. per outofmemoryCycle)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-13 00:59:59 +00:00
sixcooler
9ab0ba41e2 using GzipDecompressingEntity from httpclient instead of our own
(was just fixed there in httpclient-4.1.2 and does a proper job)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:51:30 +00:00
sixcooler
07f5954570 try better handling of corrupt blobs
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 13:27:11 +00:00
orbiter
f970670a7c - bugfix in ServerScannerList
- speed up of generation of scanner list avoiding forced dns lookup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 13:21:18 +00:00
orbiter
8e03b8ee8b better integration of server list in interactive search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 12:25:45 +00:00
orbiter
0a3ab7da1b do not sort concrrently the same array
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 08:06:21 +00:00
sixcooler
eb14111200 encapsulate potential expensive objects in TextSnippet to allow GC them asap
this reduces chance of OOMs at massive search & snippet-fetching

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:07:52 +00:00
orbiter
0d33cf352b removed synchronization in DNS resolve (solves a problem when loading snippets but in the past concurrent dns requests also caused deadlocks. but this is many years ago and we will give it another try)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-03 19:42:18 +00:00
orbiter
44d74f8f89 performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-03 18:32:11 +00:00
sixcooler
5cd07d7f84 early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
doing same thingy on other methods of touched files as well

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-02 15:52:33 +00:00
sixcooler
a311596881 finishing up my commits (7855-7858) which could be helpful for
not declaring inside loops (helps GC of some VMs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:35:24 +00:00
sixcooler
9170a434ed throwing an exception again in FileUtils.copy(reader, writer)
OOMs could occour here and should not be ignored

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:32:58 +00:00
sixcooler
ce248cc8dd less byte-arrays of response-content, less byte-array <-> stream conversation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:31:08 +00:00
sixcooler
59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:28:23 +00:00
sixcooler
916d79111e Runtime.maxMemory() DOES change @ runtime:
I wondered getting Total-ram > Max-ram and MemoryControl.available() < 0
MemoryControl.available() < 0 causes some errors where its value is used for dimension of buffers for eg.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 12:48:50 +00:00
orbiter
299af4943c added another memory protection hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-17 17:55:08 +00:00
orbiter
1f300217f8 more protection for the cleanup thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7848 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-17 08:39:39 +00:00
orbiter
d13103a0a7 changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-17 00:02:42 +00:00
orbiter
b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-16 23:17:19 +00:00
orbiter
6a6f27eaf3 do not sort arrays again if arrays are already sorted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-16 19:21:39 +00:00
orbiter
3d043ce9d6 - refactoring
- do not start worker threads in Array class if concurrency is not used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-16 19:13:30 +00:00
orbiter
48b78e9ff4 disabling concurrency in new sort since that is not working yet correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-16 11:54:47 +00:00
orbiter
62ac73a108 fixed bugs and deadlocks in core database indexing structures:
- added new Array class that contains an abstraction of the java Arrrays class which replaces the home-brew quicksort algorithm.
- the new class is about four times slower than the old one, but it works correct (the old one had errors)
- fixed a synchronization problem

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-16 10:08:43 +00:00
orbiter
1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-15 08:38:10 +00:00
orbiter
bb8e3f8523 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 21:42:30 +00:00
orbiter
11dc653de3 added a visualization of peer pings to the performance graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 07:07:06 +00:00
orbiter
3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 03:25:57 +00:00
orbiter
52d799e7c8 fix for solr auth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7833 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-05 09:21:30 +00:00
orbiter
9eb8e9acd9 no error message about missing browser in headless environments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-05 06:54:05 +00:00
orbiter
d3c89b90ce temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 17:04:49 +00:00
orbiter
bd99969758 fixed bad query
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:53:18 +00:00
orbiter
768c59740c - replaced solrj 3.1 with solrj 3.3
- updated also slf4j
- added authentication for solrj


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:35:30 +00:00
low012
c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 23:55:55 +00:00
orbiter
6d2e252bcf fix for:
java.lang.NullPointerException
	at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97)
	at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48)
	at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58)
	at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69)
	at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43)
	at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023)
	at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922)
	at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869)
	at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267)
	at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239)
	at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 20:44:33 +00:00
orbiter
2d4bb139d3 - added counting of links with noindex tag for solr index
- bugfixes for solr index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 06:40:05 +00:00
orbiter
892caccdca added default configuration in ConfigurationSet in case of new values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 00:09:49 +00:00
orbiter
bda3eec0ff added parsing of canonical link element to html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-01 16:38:01 +00:00
orbiter
b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:49:21 +00:00
orbiter
b666a929e7 fixed Semaphore handling in case of interruptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:37:14 +00:00
orbiter
de7a054d77 added parser for such files like the new solr.key.list
it parses text files with the following syntax:
 - all lines beginning with '##' are comments
 - all non-empty lines not beginning with '#' are keyword lines
 - all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:35:45 +00:00
orbiter
267290a821 removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:45:04 +00:00
orbiter
d8072d1866 added more info to DNS cache in /PerformanceMemory_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-24 08:27:36 +00:00
orbiter
f803da8aae code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-24 00:24:00 +00:00
orbiter
84c9658644 added a file type navigator
added a protocol navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 15:39:52 +00:00
orbiter
31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 11:57:17 +00:00
orbiter
7db208c992 performance hacks: more pre-allocated StringBuilder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-21 23:10:50 +00:00
orbiter
07e89a7ae5 added @Deprecated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:33:45 +00:00
orbiter
9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:29:45 +00:00
orbiter
996f0a8764 disabled assert in Base64Order which eats away too much performance during testing with -l
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:34:55 +00:00
orbiter
f667b9c289 enhanced identificator: using AtomicInteger for counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:31:10 +00:00
orbiter
16327d1cbe unwrapping of call depth (one call less for UTF8.String)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:15:01 +00:00
orbiter
f30d36b101 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:02:06 +00:00
orbiter
aa6c32d753 enhanced UTCDiffString
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7782 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 12:38:06 +00:00
f1ori
f87865a50b always shutdown log, fixes zombie processes in init stop script
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-15 09:14:51 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
orbiter
77fe69395d added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 20:04:41 +00:00
sixcooler
df1725ef43 re-enable POST over proxy, which didn't work since update to httpcore-4.1.1
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-04 13:25:03 +00:00
orbiter
2683162ec5 - added more options to access grid picture, web structure picture and network graphics
- remove test class


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-02 23:27:26 +00:00
orbiter
0c1b29f3c9 - applied many small performance hacks
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-01 19:31:56 +00:00
orbiter
fe0c08455b more concurrency (enhancement) hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 08:53:58 +00:00
orbiter
87082f407e less String object creation during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 04:19:20 +00:00
orbiter
3c2b994bd6 write access/load time to solr index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7752 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 12:35:08 +00:00
orbiter
a36fda991e hack to increase speed of url hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 12:34:38 +00:00