Commit Graph

8736 Commits

Author SHA1 Message Date
Michael Peter Christen
2160f9a819 Release 1.04 2012-07-09 00:13:59 +02:00
Michael Peter Christen
1addbc792c use less memory for md5 cache 2012-07-08 22:05:04 +02:00
Michael Peter Christen
f32de94723 more logging 2012-07-08 22:04:36 +02:00
Michael Peter Christen
d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
240).
2012-07-08 21:25:22 +02:00
Michael Peter Christen
434ee90c59 added classification for control file types which shall not be loaded
but placed onto the noload-queue
2012-07-08 21:17:33 +02:00
Michael Peter Christen
1517a3b7b9 added webm mime-type 2012-07-08 17:59:20 +02:00
Michael Peter Christen
a90bcb48f6 added webm 2012-07-08 17:58:05 +02:00
Michael Peter Christen
801972fe6f fix for url camel case parser and sentence reader 2012-07-08 16:48:09 +02:00
Michael Peter Christen
fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within
small memory configurations
2012-07-08 16:11:50 +02:00
Michael Peter Christen
92731e5287 fix for sevenzip parser 2012-07-08 16:11:19 +02:00
Michael Peter Christen
45641b0c23 catch and log a warning in RasterPlotter 2012-07-06 09:21:12 +02:00
Michael Peter Christen
8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch
- more logging for errors
2012-07-06 09:05:41 +02:00
Michael Peter Christen
c3db015410 prevent loading of content from the cache when retrieval with IFFRESH is
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.
2012-07-06 08:29:41 +02:00
Michael Peter Christen
91f14ea38e fix to solr configuration (case where the external solr was not online) 2012-07-06 01:29:13 +02:00
sixcooler
2c5b68d932 more abstraction of error message 2012-07-05 14:50:37 +02:00
Michael Peter Christen
9758c521ab abstraction of error message 2012-07-05 14:27:28 +02:00
Michael Peter Christen
ef0d09f103 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-05 14:24:19 +02:00
Michael Peter Christen
b1e7c11fba fix for pattern matcher in html parser 2012-07-05 14:24:03 +02:00
Michael Peter Christen
8a6edc0031 fix for solr shutdown 2012-07-05 14:23:43 +02:00
Michael Peter Christen
b8bcc06283 fix for urls beginning with "//" 2012-07-05 14:23:29 +02:00
sixcooler
9b6e4e46ca fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430 2012-07-05 14:06:00 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
5bd3c90907 - removed unnecessary semicolons
- added default case for switch
2012-07-05 11:18:31 +02:00
Michael Peter Christen
132afaf687 removed unaccessible code 2012-07-05 11:09:44 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
83701a1b4c removed unused ImageReference package 2012-07-05 10:24:52 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
241dd8410a removed snippet pattern filter - it was not used 2012-07-05 09:21:27 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
Michael Peter Christen
1481037820 replaced non-generic array with collection 2012-07-05 01:02:51 +02:00
Michael Peter Christen
4de50fe808 adding more principal peers for bootstraping 2012-07-05 00:43:41 +02:00
orbiter
fc0f9543fe More SentenceReader cleanup 2012-07-05 00:20:58 +02:00
orbiter
586bb0eb6a Simplified SentenceReader (no more Reader inside..) 2012-07-04 22:06:20 +02:00
orbiter
7f851d62a7 replaced HashARC with SizeLimited Objects which are less costly 2012-07-04 21:56:25 +02:00
orbiter
d4291ac1f3 more tolerance when creating solar document 2012-07-04 21:15:38 +02:00
orbiter
78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
2012-07-04 21:15:10 +02:00
orbiter
bb8dcb4911 automatically adopt size of word cache to available memory 2012-07-03 18:22:25 +02:00
Michael Peter Christen
ad09b786bf clean up parser data 2012-07-03 17:20:41 +02:00
Michael Peter Christen
276a66a793 Adding a limit of 1000 links that a parser shall store during indexing.
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.
2012-07-03 17:06:20 +02:00
Michael Peter Christen
613b45f604 - better data structures in secondary search
- fixed a big memory leak in secondary search
2012-07-03 07:12:20 +02:00
Michael Peter Christen
de903a53a0 parser refactoring & hacks 2012-07-03 06:06:38 +02:00
Michael Peter Christen
8a82609360 - smaller caches to save memory
- close cloneable iterators to free memory
2012-07-02 15:40:40 +02:00
Michael Peter Christen
7249d9c9de bugfix for concurrent seed loader 2012-07-02 14:37:57 +02:00
Michael Peter Christen
c72d3b12cd concurrently initialize the seed list during p2p network bootstrap 2012-07-02 14:27:37 +02:00
Michael Peter Christen
1825f165b8 better integration of blacklist according to use case 2012-07-02 13:57:29 +02:00
Michael Peter Christen
c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-07-02 12:20:57 +02:00
Michael Peter Christen
ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. 2012-07-02 10:27:46 +02:00
Michael Peter Christen
0c345d1559 giving threads name so its easier to see whats happening during
debugging and within a thread dump
2012-07-02 09:51:43 +02:00
reger
067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages) 2012-07-01 00:12:20 +02:00