Commit Graph

11207 Commits

Author SHA1 Message Date
Michael Peter Christen
01bbb20666 increased default logging line count to max 2014-08-06 12:40:35 +02:00
Michael Peter Christen
6344718f8b reducing the concurrent query stack size and reduced concurrency of
postprocessing to avoid OOM situations
2014-08-06 12:36:59 +02:00
Michael Peter Christen
eca9380e3d bugfix for crawler double-check: if an url is redirected, the
redirect-target was not double-checked. This is now done by replacing
the redirect-URL on the crawl queue again (where it is double-checked)
2014-08-06 12:35:12 +02:00
Michael Peter Christen
9ac0c93f17 fix for subpath crawl filter 2014-08-06 01:33:24 +02:00
Michael Peter Christen
9bc3e457dd fix for termination of all crawls 2014-08-05 22:23:52 +02:00
Michael Peter Christen
66106bdaf0 fix for crawler attribute maxdompages 2014-08-05 21:32:25 +02:00
Michael Peter Christen
49d91b94c3 npe fix in crawler 2014-08-05 21:31:59 +02:00
Michael Peter Christen
8d650ca225 added hint to port forwarding videos 2014-08-05 21:31:28 +02:00
Michael Peter Christen
b7183a7321 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-05 09:54:18 +02:00
reger
ea2e627662 fix ConfigAccounts del user with uppercase letter in name
(usernames are case sensitive, userdb.delete used toLower)
2014-08-05 01:27:27 +02:00
Michael Peter Christen
c465b791af typo 2014-08-04 16:13:39 +02:00
Michael Peter Christen
191ec8c82a added concurrency to postprocess rewrite process 2014-08-04 15:28:58 +02:00
Michael Peter Christen
a1e8bdd5e9 log ppm instead of docs/second 2014-08-04 14:44:42 +02:00
Michael Peter Christen
cc0ded7abd set process type of web graph according to fields as defined in the
schema
2014-08-04 14:44:20 +02:00
Michael Peter Christen
12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
performed
2014-08-04 14:19:37 +02:00
Michael Peter Christen
3c23b89823 less logging 2014-08-04 13:37:34 +02:00
Michael Peter Christen
a0c53174c5 better solr query logging to detect unnecessary sort requests for more
performance profiling
2014-08-04 13:00:45 +02:00
Michael Peter Christen
338f574bdc no sorting if http/www unique fields are not demanded (makes query
faster) and some code restrucuring
2014-08-04 12:59:38 +02:00
Michael Peter Christen
1609763be5 toString fix 2014-08-04 12:58:39 +02:00
Michael Peter Christen
b983e68254 more retries, less sleep 2014-08-04 08:29:35 +02:00
Michael Peter Christen
1503ba7794 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-04 08:24:31 +02:00
reger
8f77719091 fix "Ljava.lang.String" in crawl queue anchor name
(e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)
2014-08-04 02:38:58 +02:00
Michael Peter Christen
0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
saving one query for double document detection now per document
2014-08-04 02:35:38 +02:00
reger
3963bca3b6 catch IndexControlRWIs_p error if RWI not connected 2014-08-04 00:03:42 +02:00
orbiter
38864ae004 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-08-03 22:44:49 +02:00
orbiter
4099296b45 added new classes which shall reduce call overhead to Solr (stub) 2014-08-03 22:44:22 +02:00
reger
d0c02e1de7 adjust rss lat/lon to double
(common format across other classes)
2014-08-03 20:09:23 +02:00
orbiter
3491ab4c38 removed unused images from webgraph edge computation 2014-08-01 13:21:16 +02:00
orbiter
2371d6b8db target linktexts must be string to enable search facets on these fields 2014-08-01 13:20:25 +02:00
Michael Peter Christen
001e05bb80 do not store failure of loading of robots.txt into the index as a fail
document
2014-08-01 12:15:14 +02:00
Michael Peter Christen
05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-01 12:04:25 +02:00
Michael Peter Christen
98f45c9032 fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
orbiter
22ce4fb4dd better error handling for remote solr queries and exists-checks 2014-08-01 11:00:10 +02:00
reger
b510b182d8 - update Maven pom
- add ppt parser test case
2014-08-01 01:47:53 +02:00
Marc Nause
3dcfc717eb This hopefully fixes http://mantis.tokeek.de/view.php?id=424 2014-07-29 22:02:11 +02:00
Marc Nause
9df14fc126 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-29 21:26:43 +02:00
Marc Nause
477be17c51 Replaced old UPNP library with Weupnp. UPNP should
work now, at least it does on my network. UPNP code in YaCy can still
be improved though (see TODO comment: make port on gateway configurable
or find free one).

*) removed old code
*) added new lib
*) changed code to work with new lib
2014-07-29 21:26:27 +02:00
orbiter
738989aab7 reverted commit f94c91315b because the
webgraph has not enough performance for that
2014-07-29 18:49:42 +02:00
orbiter
e9163e7e10 fix for malformed hostpath names in crawl balancer 2014-07-29 11:18:45 +02:00
orbiter
161a11070c yacystats is gone :( 2014-07-29 11:12:01 +02:00
Michael Peter Christen
c115f3869c enhanced snippet computation and test method in ViewFile 2014-07-28 15:42:57 +02:00
reger
6c10b59f3e move bootstrap peers test systems to its test class
var assignment not needed  elsewhere.
2014-07-27 04:13:07 +02:00
reger
7328c2883b fix type in .init description
http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:38:53 +02:00
reger
94819f0797 set .ini default boost fields to same as assigned by button "reset to default"
(in RankingSolr_p)
- fix typo http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:17:41 +02:00
reger
b4b937a046 update to pdfbox 1.8.6 2014-07-25 23:55:10 +02:00
orbiter
1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
formulated as edismax query but this was not set as query attribut. The
defType=edismax property needs a qf-field, so this was added as well. Do
not remove that field again! This fixes also a problem with title-unique
computation.
2014-07-25 18:53:13 +02:00
Michael Peter Christen
f94c91315b if the webgraph is used, then use it also for reference computation to
avoid contradictions with references_i in the collection index.
2014-07-24 15:35:53 +02:00
Michael Peter Christen
6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
Michael Peter Christen
c63e93df46 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-24 00:04:56 +02:00
Michael Peter Christen
1bf605b6d1 toString() fix 2014-07-24 00:04:46 +02:00