reger
3b559e7846
optimize pdfParser
...
skip starting reader thread if all content already read
2014-06-10 04:25:20 +02:00
reger
09f73b790f
fix pdfParser not closed warning from pdfbox
...
for encrypted pdf on exit due to missing permission to extract
2014-06-08 08:20:30 +02:00
reger
c798a9d1bb
fix unresolved pattern in yacysearch.rss title
...
and rss xml error due to html & encoding in url entries
2014-06-07 03:01:26 +02:00
reger
92d1604a31
Crawler hostbalancer does not delete finished queue files,
...
use alternative delete to fight the sympthom (and fix deletion of host dirs on startup)
Root cause (which class holds a lock on .stack) not found.
http://mantis.tokeek.de/view.php?id=404
2014-06-05 02:13:08 +02:00
Michael Peter Christen
e64be5dcad
in case that the network is switched to any other than freeworld, RWIs
...
are disabled. This is a temporary fix. There must be a better way to
determine if RWIs are to be switched on or of.
2014-06-04 13:59:37 +02:00
Michael Peter Christen
0c324d735c
NPE fix for postprocessing without term index
2014-06-04 12:28:28 +02:00
Michael Peter Christen
87f171675b
doing index deletions using a get string which makes it easier to
...
copy-paste deletion examples (see: #EuGH :( )
2014-06-04 12:09:49 +02:00
Michael Peter Christen
a2f800cd8f
fix for bad String conversion
2014-06-04 12:07:07 +02:00
Michael Peter Christen
922979aae1
added option to prefer http over https in unique-protocol ranking
2014-06-02 17:40:56 +02:00
Michael Peter Christen
b3b174e2b8
fixed webgraph postprocessing and status display in Crawler_p servlet
2014-06-02 15:06:38 +02:00
Michael Peter Christen
3b53bee90f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-06-02 13:12:08 +02:00
Michael Peter Christen
e6b28f5958
removed check on protocol for double content (user request)
2014-06-02 13:11:44 +02:00
reger
7a52a6ba3f
add links to port config in status panel
...
- pom upd to match javadoc location
2014-06-02 02:11:54 +02:00
Michael Peter Christen
b803622ac3
changed javadoc publishing path from 'api' to 'javadoc' because there
...
are also other APIs in in YaCy.
2014-06-01 22:25:55 +02:00
reger
d8d318233e
fix logging settings
...
- add missing .level
- remove obsolete jena settings
- set default level=INFO to prevent debug logging of not explicite specified classes
2014-06-01 06:43:50 +02:00
reger
c3e40c82fe
make https port setting changeable via front end somewhere
...
(chosen Http Networking page /Settings_p.html?page=http )
2014-06-01 03:15:38 +02:00
Michael Peter Christen
698f053658
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-06-01 01:02:12 +02:00
Michael Peter Christen
f23c4142e0
added option to configure a custom user agent within allip networks
2014-06-01 01:02:03 +02:00
reger
8e233e2eb4
- fix typo in Message_p (defaultpath)
...
- use more existing switchboardconstants for getproperties
- replace depriciated call defaultservlet
2014-06-01 00:20:25 +02:00
orbiter
d7d38f9135
made number of open files in crawler configurable and increased default
...
maximum number of open files from 100 to 1000. This number can be
changed with the attribut crawler.onDemandLimit
2014-05-31 09:29:55 +02:00
Michael Peter Christen
20cffa34bf
Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
2014-05-30 16:13:09 +02:00
malykhin.dmitry
873f8c2d2c
Update russian translation
2014-05-30 07:12:56 +04:00
Michael Peter Christen
c43acb0e80
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-29 13:24:44 +02:00
Michael Peter Christen
8ad41a882c
fixed several problems with postprocessing:
...
- unique-postprocessing was destroying results from other
postprocessings; removed cross-updates as they had been not necessary
- unique-postprocessing did not restrict on same protocol
- inefficient concurrent update cache was redesigned completely
- increased limits for concurrent blocking queues to prevent early
time-out
2014-05-29 13:24:24 +02:00
sixcooler
370f1c408e
Changed Windows Firewall Rules to just honor the default Port 8090, but
...
not use any programm-path.
This should match more installations in different paths and also running
YaCy as service (prunsrv).
This commit was contributed and tested on Windows7 by René.
2014-05-29 00:01:48 +02:00
Michael Peter Christen
640b684bb6
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-28 19:19:17 +02:00
Michael Peter Christen
2f5477ea59
a try to fix the mixed up terms 'Active' -> 'Senior' and 'Passive' ->
...
'Junior'
2014-05-28 18:48:54 +02:00
reger
ca5437dd50
fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149
...
local files can be crawled (intranet mode) url parsing fixed according to RFC 1738 (for unix and windows)
for win like file:///c:/tmp or file://localhost/c:/tmp
for linux like file:///tmp or file://localhost/tmp
Host is ignored and path must be absolute
2014-05-28 03:01:34 +02:00
Michael Peter Christen
9b4282344b
changed debian dependency to openjdk-7-jre-headless
2014-05-27 18:57:05 +02:00
Michael Peter Christen
ff5b3ac84d
added new fields http_unique_b and www_unique_b which can be used for
...
ranking to prefer urls containing a www subdomain or using the https
protocol
2014-05-27 15:28:28 +02:00
reger
66f6797f52
make config search page layout closer to actual page appearance
2014-05-25 01:06:39 +02:00
reger
9ecf28b708
- upd pom to Solr 4.8.1 and latest jar updates
...
- upd nsis java autodownload package to jre 7u55
2014-05-24 01:01:27 +02:00
Michael Peter Christen
06f2eeda22
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-23 00:50:10 +02:00
Michael Peter Christen
5d5896b3f6
fixed dependency in debian package on java 7
2014-05-23 00:49:50 +02:00
sixcooler
5b1c4ef191
Monitoring and limit connection-count for Jetty
2014-05-22 22:16:39 +02:00
sixcooler
046e41e376
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-22 21:59:57 +02:00
orbiter
ee7416816b
upgraded poi library (office document format parser) from 3.9 to 3.10
2014-05-22 15:53:07 +02:00
orbiter
ce1dbfeb0f
fix appearance of image search thumbnails.
2014-05-22 15:01:58 +02:00
orbiter
6daae59479
switch on core.service.rwi when switching back from portal mode to p2p
...
mode
2014-05-22 12:55:22 +02:00
orbiter
a12701ddf6
upgraded bouncy caste libraries (needed for encrypted pdfs, dependency
...
in pdfbox) to 1.46
removed the activation.jar library; I don't know which other library
depends on it.
2014-05-22 12:09:21 +02:00
Michael Peter Christen
f0db501630
better handling of ranking parameters and new default values for date
...
navigation which is done using ranking in solr.
2014-05-22 03:01:07 +02:00
Michael Peter Christen
53948da7d0
tried to make last_modified recognition smarter
2014-05-22 00:28:51 +02:00
Michael Peter Christen
2d03037965
'Last-Modified', not 'Last-modified' according to
...
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
2014-05-21 23:21:31 +02:00
Michael Peter Christen
2520590b45
migrated from pdfbox 1.8.4 to 1.8.5. They have a very long bugfix list
...
for that update:
http://www.apache.org/dist/pdfbox/1.8.5/RELEASE-NOTES.txt
2014-05-21 22:48:41 +02:00
sixcooler
2d508618a4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-21 20:57:47 +02:00
Michael Peter Christen
3dc5fb0050
fix for operator precedence bug (cast binds stronger than bitwise AND)
...
in peer hash hashing. This should not change anything if java casts long
to int by masking with 0xFFFFFFFFL but you never know. The important
thing is, that the hashCode() should not return numbers that have the
same order as the hash code order because hashing of seeds is used to
remove the order in some places.
2014-05-21 18:37:52 +02:00
Michael Peter Christen
6634b5b737
debug code for index distribution testing
2014-05-21 18:20:16 +02:00
Michael Peter Christen
89e13fa34e
fixed bug in test function
2014-05-21 15:31:47 +02:00
sixcooler
bf2ae57126
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-21 13:30:48 +02:00
sixcooler
275def648b
Revert "manual merge"
...
This reverts commit 3bfab8566c
.
2014-05-21 13:29:46 +02:00