Commit Graph

10322 Commits

Author SHA1 Message Date
Michael Peter Christen
1a4a69c226 set more logger to 'final static' 2013-11-13 06:18:48 +01:00
Michael Peter Christen
c60947360d logger should be static 2013-11-13 06:04:28 +01:00
Michael Peter Christen
69b8d61c47 fix for search requests in GSA interface which contain 'funny'
characters (like ':' etc.)
2013-11-12 15:54:54 +01:00
orbiter
b085cb522b replaced old existsByIds for embedded Solr with obviously much faster
new selection method (including stil existing debug code to test that
this is in fact better)
2013-11-11 11:25:01 +01:00
reger
1a6158e338 make test directory available in Maven pom
- exclude reference to old slf4j-log4j12
2013-11-10 22:20:35 +01:00
reger
b4fdb8c887 cleanup test directory from Jetty 9 implementation samples
- current Jetty implementation advances so that it seems not beneficial to keep the code
as it makes the test unuseable and use of Jetty 9 is due to Java 1.7 dependency not in sight.
2013-11-10 22:01:31 +01:00
reger
b29d262e70 implement Jetty8HttpServerImpl.generateSocketAddress
(code 1:1 copied from serverCore)
2013-11-10 18:59:18 +01:00
orbiter
4234b0ed6c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-10 18:50:43 +01:00
orbiter
909bbb49d8 added (partly commented) test code for url rewrite methods .. to be
completed
2013-11-10 18:50:34 +01:00
orbiter
74c86a72a0 better default value for crawler user agent 2013-11-10 18:48:00 +01:00
reger
066a1ecf0a add highlight queryparams to solrservlet if missing
- modify query params in Solr parameter map (instead of querystring)
2013-11-10 01:36:57 +01:00
Michael Peter Christen
899e7e92b0 added debug code 2013-11-09 02:37:12 +01:00
Michael Peter Christen
a5c1249ee2 reverted autowarming setting in solrconfig 2013-11-09 01:43:44 +01:00
reger
4684330505 Merge origin/master into jetty
Conflicts:
	source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
2013-11-07 21:44:14 +01:00
reger
1437c45383 merge rc1/master 2013-11-07 21:30:17 +01:00
Michael Peter Christen
87a956e881 calculating and showing the number of files and the average size of a
file in the HTCACHE in ConfigHTCache_p.html
2013-11-07 12:13:12 +01:00
Michael Peter Christen
acc1f8a749 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-07 12:01:37 +01:00
Michael Peter Christen
81d9e23532 fixed another memory leak in the PDF parser:
the class org.apache.pdfbox.pdmodel.font.PDFont occupies 8MB of space
which cannot be cleaned if PDFont.clearResources is called.
The attempt to clean the class cache therefore causes that the class is
loaded and this cache is initialized with some rubbish. I tried to
prevent to instantiate this class by usage of a hacked findLoadedClass
call to the SystemClassLoader (which is protected ...).
Now, without using the PDF parser at all, 8MB of RAM space is not
occupied, however, when the first PDF arrives this space will be taked
and never given back to GC.
WAKE UP YOU LAZY PDFBOX HACKER AND FIX THIS SHIT!
2013-11-07 11:57:01 +01:00
Michael Peter Christen
c152d996e6 reduced footprint of BookmarksDB which can take quite a lot of memory if
the number of bookmarks is high (i.e. > 2000 URLs)
2013-11-07 10:55:02 +01:00
Michael Peter Christen
81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
2013-11-07 10:01:44 +01:00
reger
7b17cdf6dd add content_type:image/* to image search
- see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result
- try it yourself with following sample query
   /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type

adresses also possible url without or deviating extension.
2013-11-07 03:11:03 +01:00
reger
082c9a98c1 move writeHeaders from Jetty8 servlet to YaCyDefaultServlet
- after removing Jetty server dependency (of Response using HttpServletResponse only)
2013-11-07 00:32:21 +01:00
sixcooler
987f410011 URL-export:add query and fix for cast-class-exception 2013-11-06 19:22:26 +01:00
Michael Peter Christen
ffe8276063 replaced referrer link masking to 'pure' links to the referring page
(that was more useful during testing)
2013-11-06 18:05:46 +01:00
Michael Peter Christen
a8253ca49c added missing unicode transformation in href link contents during
parsing
2013-11-06 18:05:02 +01:00
Michael Peter Christen
0cf9e9580b added clickdepth and CR computation debug code to verify that the
process is complete
2013-11-06 15:01:40 +01:00
Michael Peter Christen
7f768b42d3 we do not need the load-image flag any more since this is now controlled
by parser switches
2013-11-06 15:00:57 +01:00
reger
b85f702f22 add AccessTracker logging to SolrServlet 2013-11-05 22:57:55 +01:00
reger
de1f02420b implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet.
- set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html"
- set a contenttype to GSAsearchServlet
2013-11-04 21:11:12 +01:00
Michael Peter Christen
234a974955 load image only if their parser flag is activated 2013-11-04 11:59:28 +01:00
Michael Peter Christen
b2c329929f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-04 10:18:52 +01:00
Michael Peter Christen
60187a4ec2 fix in html parser 2013-11-04 10:16:20 +01:00
Michael Peter Christen
e1c1e57877 less overhead calling exist() with only one hash 2013-11-04 09:37:31 +01:00
reger
3d5d366f1c fix html header in Solr HTMLResponseWriter
- move 1st body content after </head> tag
- add closing <span> tag
2013-11-04 03:12:02 +01:00
reger
bfdb404867 implement a Jetty reconnect to work with Configbasic_p.html port change
- instead of shutting down the server it should be sufficient to manipulate the Jetty http connector
2013-11-03 21:34:21 +01:00
Michael Peter Christen
5a02d650ee avoid cloning 2013-11-03 18:31:50 +01:00
reger
8ec350bad2 upd Maven pom (take back introduced java-templates)
to handle filtering of yacyBuildProperties.java.
To keep it compatible with ant filter directly from original sourcd/.... location.
2013-11-03 02:38:36 +01:00
reger
d6760df3e5 fix servlet class exist check to use default path only (in Jetty8YaCyDefaultServlet)
- del redundant doget code in yacydefaultservlet
   - small declaration code opts
- del obsolete libt/proxyservlet.java
2013-11-03 02:26:00 +01:00
reger
9da87c0c7f update Maven build script
- use current YaCy version number
- make use of libbuild\GitRevMavenTask (maven-plugin-gitrevisionnumber)
- make yacyBuildProperties.java available for source filtering by Maven-plugin (copy to libbuild\java-templates)
- update assembly definition to include lib\yacycore.jar without version number (needed this way by startupscript)
2013-11-02 06:27:18 +01:00
reger
62c591ffd1 add Maven plugin to return a YaCy style Git repository build release number and timestamp
- it injects properties which can be used in pom via ${DSTAMP} ${releaseNr} if added as plugin via
<plugin>
<groupId>net.yacy</groupId>
<artifactId>maven-plugin-gitrevisionnumber</artifactId>
<version>1.0</version>
<executions><execution>
<phase>initialize</phase>
<goals><goal>create</goal></goals>
</execution></executions>
</plugin>
2013-11-02 02:33:06 +01:00
reger
b38de92a16 Merge origin/master into jetty 2013-11-02 00:48:42 +01:00
reger
a09e70cd68 fix typo in GitRevTask (branch) 2013-11-02 00:18:24 +01:00
Michael Peter Christen
cc39667399 Speed enhancements and less CPU usage during Solr searches when using
the embedded Solr (the default). This was obtained by cirumventing solrj
search encapsulation and the implementation of direct index access
methods to Solr.
The effect will not only be seen during search, but this has also a
strong effect on suggestions (much more) and less CPU power usage during
index distribution (which needs many search requests)
2013-11-01 17:24:36 +01:00
Michael Peter Christen
434e13b46d in host browser also show the properties of failed documents including
referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)
2013-11-01 13:30:53 +01:00
orbiter
176acce5cb version number change for next development cycle 2013-10-31 16:20:33 +01:00
orbiter
1ac504ae51 use html encoding for urls in metadata 2013-10-31 16:16:29 +01:00
reger
6944225037 - add GSA search /gsa/search servlet for Jetty to Server init
- include SecurityHandler check for /gsa/ /solr/ 
- change one more YaCyDefaultServlet dependency from Jetty to std. javax.Servlet
2013-10-30 23:11:36 +01:00
reger
ec3c0582ae update Maven pom and jar dependencies 2013-10-30 01:13:12 +01:00
reger
53cb30a221 reduce logging (by assigning logger to existing logger)
- small additional cleanups
2013-10-30 00:51:04 +01:00
reger
332c6d4fe1 reactivate Domain handler for .yacy / .yacyh handling 2013-10-27 19:15:20 +01:00