reger
206883f80d
fix: Preserve protocol in url proxy
...
to connect to http/https. Display warning if https target is viewed over http
2015-08-25 01:16:41 +02:00
reger
f7b0b3b7b3
avoid runtime exception by earlier testing for seed.ip=null
2015-08-23 23:01:20 +02:00
reger
0f80bc8309
upd to jsoup-1.8.3
2015-08-19 22:46:48 +02:00
Michael Peter Christen
906b5fd742
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-08-11 00:42:46 +02:00
Michael Peter Christen
8f90767889
fix for filesystem crawl
2015-08-11 00:42:26 +02:00
sixcooler
a3dd4be749
added / corrected charste to be 1.7 compatible.
...
@Orbiter: please check is this is ok for you
2015-08-10 20:53:20 +02:00
Michael Peter Christen
8028410ab7
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-08-10 14:27:53 +02:00
Michael Peter Christen
df3314ac1a
added a new facet type based on a probabilistic classifier using
...
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.
New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".
To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.
Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
reger
1409cabe8b
exclude more default search fields from text copy to text_t
...
for metadata index documents
2015-08-09 21:01:30 +02:00
reger
e2e73258ca
remove obsolete interface SearchAccumulator
...
and unused SRURSSConnector Thread inheritance
2015-08-08 18:35:49 +02:00
Michael Peter Christen
dbbad23e12
removed warnings
2015-08-03 05:37:34 +02:00
Michael Peter Christen
500cfa9457
enhanced logging
2015-08-03 05:17:22 +02:00
Michael Peter Christen
c14bc8d9b7
revert of fq transformation (recent fix)
2015-08-03 05:15:34 +02:00
Michael Peter Christen
203df5a750
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-08-03 05:02:26 +02:00
reger
fa08ca207e
! finish running crawls before applying !
...
Allow crawl urls up to 2048 character
fix for http://mantis.tokeek.de/view.php?id=575
2015-08-03 00:49:24 +02:00
reger
ee77f24e52
use some more declared HeaderFramework constants
2015-08-02 22:56:14 +02:00
reger
9e4043731d
add missing ; in base.css
2015-08-02 21:36:44 +02:00
Michael Peter Christen
11a848da5a
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-08-02 14:53:36 +02:00
Michael Peter Christen
b94bd7f20a
a collection of search query enhancements:
...
- fixed superfluous space in query field list
- fixed filter query logic
- removed look-ahead query which caused that each new search page
submitted two solr queries
- fixed random solr result orders in case that the solr score was equal:
this was then re-ordered by YaCy using the document hash which came from
the solr object and that appeared to be random. Now the hash of the url
is used and the score is additionally modified by the url length to
prevent that this particular case appears at all.
2015-08-02 14:52:41 +02:00
reger
5ba9924289
pom: have Maven dependency management decide on transitive Lucene dependencies
2015-08-02 03:39:58 +02:00
reger
dbe2594c38
replace deprecated myPublicLocalIP() in AbstractRemoteHandler
2015-08-02 00:53:49 +02:00
reger
6d3534e725
remove unused Transmission hit counter
2015-08-02 00:20:14 +02:00
reger
cb67eb7baf
use more absolute path for config file opening
...
as suggested in pull request 5 (https://github.com/yacy/yacy_search_server/pull/5 )
2015-08-01 23:54:26 +02:00
reger
92e5b217b6
upd to pdfbox-1.8.10
2015-08-01 00:25:40 +02:00
Michael Peter Christen
1ccbf739b1
added bayes filter from Philipp Nolte, originally taken from
...
https://github.com/ptnplanet/Java-Naive-Bayes-Classifier
and modified inside the loklak.org project. After optimization in loklak
it was inserted into the net.yacy.cora.bayes package. It shall be used
to create custom search navigation filters.
The original copyright notice was copied from the README.md from
https://github.com/ptnplanet/Java-Naive-Bayes-Classifier/blob/master/README.md
The original package domain was
de.daslaboratorium.machinelearning.classifier
2015-07-30 14:10:31 +02:00
Michael Peter Christen
1bced1ae60
using latest enhanced (un/)gzip methods from loklak for yacy
2015-07-30 13:39:10 +02:00
Michael Peter Christen
3e6657288d
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-07-30 03:39:11 +02:00
Michael Peter Christen
de8cfbe1d7
added export option to export the fulltext of the search index text only
2015-07-30 03:21:40 +02:00
reger
165561706d
upd to Solr-5.2.1
2015-07-30 00:16:09 +02:00
reger
2fb6ebe88a
move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting.
...
Background: some user report problem with connecting/crawling some sites via https which require SNI support (by default switched off in YaCy). On the other hand systems not demanding SNI support are sometimes not properly configured and due to a bug/feature in java 1.7 connection is aborted. The later is more often the case, so the default is still fine. With the java start parameter expert user can no alter the startparameter to -Djsse.enableSNIExtension=true (java default) if they crawl more hosts requiring SNI support.
The alternative to let YaCy try both during https handshake (deep inside the httpclient) is not pursut at this time.
2015-07-29 23:30:05 +02:00
Michael Peter Christen
fbeae20b3a
try a healing of the cache if the index file is corrupted
2015-07-27 15:16:08 +02:00
Michael Peter Christen
7e158ae085
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-07-27 15:03:34 +02:00
Michael Peter Christen
03ea723889
added log lines for query performance profiling
2015-07-27 15:03:13 +02:00
reger
7f49dbfbd1
upd to SLF4J-1.7.12
2015-07-27 00:57:19 +02:00
reger
807e3dc78a
upd to httpclient-4.5 and httpmime-4.5
2015-07-26 00:53:40 +02:00
reger
202620b4a2
upd to icu4j-55.1.jar
2015-07-25 00:50:41 +02:00
reger
149e41f25b
upd to jsch-0.1.53.jar
2015-07-21 22:31:34 +02:00
reger
30135d8964
upd to lib/weupnp-0.1.3.jar
2015-07-20 03:45:23 +02:00
Michael Peter Christen
ec75959162
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-07-16 23:42:51 +02:00
Michael Peter Christen
785781253e
added jsonp to suggest servlet
2015-07-16 23:42:41 +02:00
reger
5cf988f224
upd NB classpath
2015-07-15 01:04:59 +02:00
Michael Peter Christen
32a804b10c
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-07-13 12:15:58 +02:00
Michael Peter Christen
0e87a99ab8
more fixes for special windows paths
2015-07-10 17:34:29 +02:00
Michael Peter Christen
e5b6424eed
patch for bad windows file paths
2015-07-10 17:14:14 +02:00
Michael Peter Christen
0aa6fcf259
remove old vocabularies and synonyms before adding new
2015-07-10 16:47:19 +02:00
Michael Peter Christen
e1cd9c0dba
added another default network / commented out
2015-07-09 16:25:11 +02:00
Michael Peter Christen
289018b559
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2015-07-08 17:37:03 +02:00
Michael Peter Christen
7b412e8c07
added msg (text emails) format; should be handled by html parser.
2015-07-08 17:36:37 +02:00
reger
f91298d3b6
fix one implicit Integer/Long type conversion
...
-> causes Java 1.8 compile error
2015-07-08 03:02:10 +02:00
reger
821262a179
add CommonPattern for multiple spaces
...
to eliminate empty split words on following spaces
2015-07-04 22:49:01 +02:00