Commit Graph

12012 Commits

Author SHA1 Message Date
Michael Peter Christen
500cfa9457 enhanced logging 2015-08-03 05:17:22 +02:00
Michael Peter Christen
c14bc8d9b7 revert of fq transformation (recent fix) 2015-08-03 05:15:34 +02:00
Michael Peter Christen
203df5a750 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-08-03 05:02:26 +02:00
reger
fa08ca207e ! finish running crawls before applying !
Allow crawl urls up to 2048 character 
fix for http://mantis.tokeek.de/view.php?id=575
2015-08-03 00:49:24 +02:00
reger
ee77f24e52 use some more declared HeaderFramework constants 2015-08-02 22:56:14 +02:00
reger
9e4043731d add missing ; in base.css 2015-08-02 21:36:44 +02:00
Michael Peter Christen
11a848da5a Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-08-02 14:53:36 +02:00
Michael Peter Christen
b94bd7f20a a collection of search query enhancements:
- fixed superfluous space in query field list
- fixed filter query logic
- removed look-ahead query which caused that each new search page
submitted two solr queries
- fixed random solr result orders in case that the solr score was equal:
this was then re-ordered by YaCy using the document hash which came from
the solr object and that appeared to be random. Now the hash of the url
is used and the score is additionally modified by the url length to
prevent that this particular case appears at all.
2015-08-02 14:52:41 +02:00
reger
5ba9924289 pom: have Maven dependency management decide on transitive Lucene dependencies 2015-08-02 03:39:58 +02:00
reger
dbe2594c38 replace deprecated myPublicLocalIP() in AbstractRemoteHandler 2015-08-02 00:53:49 +02:00
reger
6d3534e725 remove unused Transmission hit counter 2015-08-02 00:20:14 +02:00
reger
cb67eb7baf use more absolute path for config file opening
as suggested in pull request 5 (https://github.com/yacy/yacy_search_server/pull/5)
2015-08-01 23:54:26 +02:00
reger
92e5b217b6 upd to pdfbox-1.8.10 2015-08-01 00:25:40 +02:00
Michael Peter Christen
1ccbf739b1 added bayes filter from Philipp Nolte, originally taken from
https://github.com/ptnplanet/Java-Naive-Bayes-Classifier
and modified inside the loklak.org project. After optimization in loklak
it was inserted into the net.yacy.cora.bayes package. It shall be used
to create custom search navigation filters.

The original copyright notice was copied from the README.md from
https://github.com/ptnplanet/Java-Naive-Bayes-Classifier/blob/master/README.md
The original package domain was
de.daslaboratorium.machinelearning.classifier
2015-07-30 14:10:31 +02:00
Michael Peter Christen
1bced1ae60 using latest enhanced (un/)gzip methods from loklak for yacy 2015-07-30 13:39:10 +02:00
Michael Peter Christen
3e6657288d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-30 03:39:11 +02:00
Michael Peter Christen
de8cfbe1d7 added export option to export the fulltext of the search index text only 2015-07-30 03:21:40 +02:00
reger
165561706d upd to Solr-5.2.1 2015-07-30 00:16:09 +02:00
reger
2fb6ebe88a move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting.
Background: some user report problem with connecting/crawling some sites via https which require SNI support (by default switched off in YaCy). On the other hand systems not demanding SNI support are sometimes not properly configured and due to a bug/feature in java 1.7 connection is aborted. The later is more often the case, so the default is still fine. With the java start parameter expert user can no alter the startparameter to -Djsse.enableSNIExtension=true (java default) if they crawl more hosts requiring SNI support.
The alternative to let YaCy try both during https handshake (deep inside the httpclient) is not pursut at this time.
2015-07-29 23:30:05 +02:00
Michael Peter Christen
fbeae20b3a try a healing of the cache if the index file is corrupted 2015-07-27 15:16:08 +02:00
Michael Peter Christen
7e158ae085 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-27 15:03:34 +02:00
Michael Peter Christen
03ea723889 added log lines for query performance profiling 2015-07-27 15:03:13 +02:00
reger
7f49dbfbd1 upd to SLF4J-1.7.12 2015-07-27 00:57:19 +02:00
reger
807e3dc78a upd to httpclient-4.5 and httpmime-4.5 2015-07-26 00:53:40 +02:00
reger
202620b4a2 upd to icu4j-55.1.jar 2015-07-25 00:50:41 +02:00
reger
149e41f25b upd to jsch-0.1.53.jar 2015-07-21 22:31:34 +02:00
Kirill Fomchenko
ab22a32c09 Fixed CSS scrolling
When the sidebar on search page becomes scrollable, the scrollbar shrinks the sidebar and makes the search results weirdly scrollable on X axis by several pixels. Now the sidebar always have a scrollbar, and results are never X-scrollable.
2015-07-21 08:21:10 +03:00
reger
30135d8964 upd to lib/weupnp-0.1.3.jar 2015-07-20 03:45:23 +02:00
Michael Peter Christen
ec75959162 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-16 23:42:51 +02:00
Michael Peter Christen
785781253e added jsonp to suggest servlet 2015-07-16 23:42:41 +02:00
reger
5cf988f224 upd NB classpath 2015-07-15 01:04:59 +02:00
Michael Peter Christen
32a804b10c Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-13 12:15:58 +02:00
Michael Peter Christen
0e87a99ab8 more fixes for special windows paths 2015-07-10 17:34:29 +02:00
Michael Peter Christen
e5b6424eed patch for bad windows file paths 2015-07-10 17:14:14 +02:00
Michael Peter Christen
0aa6fcf259 remove old vocabularies and synonyms before adding new 2015-07-10 16:47:19 +02:00
Michael Peter Christen
e1cd9c0dba added another default network / commented out 2015-07-09 16:25:11 +02:00
Michael Peter Christen
289018b559 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-08 17:37:03 +02:00
Michael Peter Christen
7b412e8c07 added msg (text emails) format; should be handled by html parser. 2015-07-08 17:36:37 +02:00
reger
f91298d3b6 fix one implicit Integer/Long type conversion
-> causes Java 1.8 compile error
2015-07-08 03:02:10 +02:00
reger
821262a179 add CommonPattern for multiple spaces
to eliminate empty split words on following spaces
2015-07-04 22:49:01 +02:00
Michael Peter Christen
90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
during surrogate reading: those attributes from the dump are removed
during the import process and replaced by new detected attributes
according to the setting of the YaCy peer.
This may cause that all such attributes are removed if the importing
peer has no synonyms and/or no vocabularies defined.
2015-07-02 00:23:50 +02:00
Michael Peter Christen
7829480b82 refactoring: separated condenser and tokenizer 2015-07-01 18:28:18 +02:00
reger
00d2062813 Rem depreciated AdminHandlers in solrconfig.xml
avoid warning log
W  org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/"  class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
2015-07-01 00:58:23 +02:00
Michael Peter Christen
f901e7d3cf fix for non-authorized view of IndexBrowser: show only the number of
non-failure documents
2015-06-30 11:12:36 +02:00
Michael Peter Christen
593de05922 enhanced surrogate import process speed (dramatically!) 2015-06-29 12:28:34 +02:00
Michael Peter Christen
3c4c69adea fix for
- bad regex computation for crawl start from file (limitation on domain
did not work)
- servlet error when starting crawl from a large list of urls
2015-06-29 02:02:01 +02:00
Michael Peter Christen
1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
2015-06-24 13:02:12 +02:00
Michael Peter Christen
886fca2260 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-24 01:59:46 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
6c2e6f1f37 remove redundant code 2015-06-23 23:41:43 +02:00