Commit Graph

11907 Commits

Author SHA1 Message Date
reger
41c4eade51 extract modification date from vCard (vcfParser) 2015-09-06 04:28:27 +02:00
reger
8768896975 extract lastmodified from openoffice doc
set lastmod date in office document parsers
2015-09-06 00:04:54 +02:00
Michael Peter Christen
c40c302748 when many crawl queues are generated, this NPE can occur; probably
caused as concurrency issue:
W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException
java.lang.NullPointerException
	at java.util.TreeMap.rotateRight(TreeMap.java:2239)
	at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271)
	at java.util.TreeMap.put(TreeMap.java:582)
	at net.yacy.kelondro.table.Table.<init>(Table.java:235)
	at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229)
	at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204)
	at net.yacy.crawler.HostQueue.push(HostQueue.java:397)
	at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237)
	at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184)
	at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355)
	at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134)
	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at
net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101)
	at
net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82)
	at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2015-09-05 14:12:17 +02:00
Michael Peter Christen
94cfa63c46 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-09-05 14:07:53 +02:00
Michael Peter Christen
0a37d8af89 in case that a site crawl is started for urls with file:// path, the
host filter does not work because there is no host given in such urls.
In that case, patch the filter to be a sub-path filter.
2015-09-05 14:07:23 +02:00
reger
367fe388b9 fix exception throw after sendError in DefaultServlet
- reduce debug exception logs in crawler
2015-09-05 01:57:30 +02:00
Michael Peter Christen
348b8db9d2 Merge pull request #12 from luccioman/master
Updated french locale and added new translator utils
2015-09-04 17:05:06 +02:00
luccioman
9df249296a Return to mai repository version 2015-09-04 13:52:03 +02:00
luccioman
9752bd5f88 Added utils to help translation without launching full YaCy application
:
- translate all source files with a locale
- list all non translated files with a locale
2015-09-04 13:44:44 +02:00
luccioman
2f0f0180e2 Added a function to list files recursively. 2015-09-04 13:42:57 +02:00
luccioman
7e4c1d2282 Translator refactoring :
- deleted useless new StringBuilder allocation
- use of a new reusable FileNameFilter
- added javadoc
2015-09-04 13:42:10 +02:00
luccioman
c1d937a90c Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-04 09:57:49 +02:00
reger
7c1da173e0 fix missing license in image search
see http://mantis.tokeek.de/view.php?id=522
2015-09-03 23:36:57 +02:00
luccioman
f17863588f Updated french translations for yacysearhitem.html,
yacysearchtrailer.html and Steering.html files.
Corrected various labels.
2015-09-03 09:02:03 +02:00
luccioman
918ef72bbe Corrected br markup 2015-09-03 08:59:17 +02:00
luccioman
f88bb2277e Corrected bookmark link title 2015-09-03 08:58:14 +02:00
luccioman
802ea66d19 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-03 08:04:38 +02:00
reger
5297e80cda fix missing onclick in ConfigPortal
to enable checkbox
2015-09-03 00:59:14 +02:00
luccioman
cc8d6ad75f Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-02 08:51:42 +02:00
reger
802ccaead6 fix init of error cache, use latest faildates => load_date_dt 2015-09-02 02:36:31 +02:00
reger
dba7f15073 apply same size constrain on result image from doc
as for linked images
see 19f1308bf0
2015-09-01 23:22:48 +02:00
reger
5e45f1a460 enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field 2015-09-01 21:47:25 +02:00
luccioman
70e483ecc6 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-01 08:57:32 +02:00
reger
4cf875336c complete TODO: getFileExtension handle dot in query part
+ testcase
2015-08-31 23:28:03 +02:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
sixcooler
c729d089b6 French Translation update by Luc:
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5671
2015-08-31 19:57:57 +02:00
luccioman
e0dda0c01c Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server.git 2015-08-31 10:26:40 +02:00
reger
eaf0e8ff2c start recording/indexing pixel size for image document
as for linked images
2015-08-31 01:58:36 +02:00
reger
c33229fc0c check mime prior to ext for metadata modification for images 2015-08-30 23:02:19 +02:00
reger
19f1308bf0 enforce th result images limit to > 16x16px
for linked images
http://mantis.tokeek.de/view.php?id=594
2015-08-30 02:19:52 +02:00
luccioman
a4509ea2ca Updated french translation for index.html, yacysearch.html and
simpleheader.template. Correcte special characters to use HTML entities
instead.
2015-08-27 09:47:30 +02:00
reger
250f6457f0 remove exired domain titan.deep-one.in from bootstrap.seedlist 2015-08-26 23:58:08 +02:00
luccioman
67799ce867 Updated translation of index.html, yacysearch.html and
simpleheader.template, corrected some special characters not written as
HTML entities.
2015-08-26 14:40:39 +02:00
reger
0e4ba0360b fix NPE on .yacyh result url of disconnected peer
(cleanup yacyshare remaining)
2015-08-25 23:26:17 +02:00
reger
7ed812a2bf log missing seed.port
in favour of exception to prevent repeating throws
2015-08-25 02:19:00 +02:00
reger
206883f80d fix: Preserve protocol in url proxy
to connect to http/https. Display warning if https target is viewed over http
2015-08-25 01:16:41 +02:00
reger
f7b0b3b7b3 avoid runtime exception by earlier testing for seed.ip=null 2015-08-23 23:01:20 +02:00
reger
0f80bc8309 upd to jsoup-1.8.3 2015-08-19 22:46:48 +02:00
Michael Peter Christen
906b5fd742 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-08-11 00:42:46 +02:00
Michael Peter Christen
8f90767889 fix for filesystem crawl 2015-08-11 00:42:26 +02:00
sixcooler
a3dd4be749 added / corrected charste to be 1.7 compatible.
@Orbiter: please check is this is ok for you
2015-08-10 20:53:20 +02:00
Michael Peter Christen
8028410ab7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-08-10 14:27:53 +02:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
reger
1409cabe8b exclude more default search fields from text copy to text_t
for metadata index documents
2015-08-09 21:01:30 +02:00
reger
e2e73258ca remove obsolete interface SearchAccumulator
and unused SRURSSConnector Thread inheritance
2015-08-08 18:35:49 +02:00
Michael Peter Christen
dbbad23e12 removed warnings 2015-08-03 05:37:34 +02:00
Michael Peter Christen
500cfa9457 enhanced logging 2015-08-03 05:17:22 +02:00
Michael Peter Christen
c14bc8d9b7 revert of fq transformation (recent fix) 2015-08-03 05:15:34 +02:00
Michael Peter Christen
203df5a750 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-08-03 05:02:26 +02:00
reger
fa08ca207e ! finish running crawls before applying !
Allow crawl urls up to 2048 character 
fix for http://mantis.tokeek.de/view.php?id=575
2015-08-03 00:49:24 +02:00