Commit Graph

4912 Commits

Author SHA1 Message Date
Michael Peter Christen
a86c2fe77d fixed usage of media flag when started by automated process 2014-02-22 01:44:08 +01:00
Michael Benz
f11314aae7 Improved German de.lng translation and fixed adresses -> addresses in \htroot\CrawlStartScanner_p.html 2014-02-20 08:47:32 +01:00
Michael Peter Christen
f0eec6d0f3 Merge branch 'master' of git://gitorious.org/~copro/yacy/copros-rc1 2014-02-20 00:50:48 +01:00
Michael Benz
6278af4993 Edit German de locale and improved translation 2014-02-20 00:32:20 +01:00
Michael Peter Christen
69391e5d9e changed strategy to test existence of documents in Solr: using the
update time. The reason for that is a better caching for the crawler
double-check, which needs the update time for crawler steering.
2014-02-19 04:03:45 +01:00
reger
a02e33dcb6 add edit-link to PK field of table admin 2014-02-16 02:26:11 +01:00
Michael Peter Christen
9eb668e951 enhanced the resource observer
The resource observer is now able to recognize free disk space AND
available space for YaCy. The amount of space which is assigned for YaCy
are defined in new settings in the configuration file.
Furthermore, there is now a cleanup process which deletes files in case
that an autodelete is activated. The autodelete is now BY DEFAULT ON if
the disk space is low, which means that YaCy starts to delete documents
when the disk is full!
2014-02-12 01:00:44 +01:00
Michael Peter Christen
cb2c25d930 in case that the crawler is running and the search user is the peer
admin, we expect that the user wants to check recently crawled document
to ensure that recent crawl results are inside the search results, we do
a soft commit here.
2014-02-11 22:02:10 +01:00
Michael Peter Christen
bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
database and not needed any more
2014-02-11 22:01:25 +01:00
Michael Peter Christen
bc28247089 Added methods in resource observer to calculate the available and the
occupied disc space. These values are also shown on the status page.
The disc space calculation shall be used for a disk-limitation of the
search index.
2014-02-11 03:20:03 +01:00
reger
365f77ea8c make internal page links relative to ease any future development for context aware servlets
note also http://bugs.yacy.net/view.php?id=106
2014-02-10 21:40:42 +01:00
Michael Peter Christen
d9858e1b8a removed warnings and superfluous logging 2014-02-09 12:26:58 +01:00
Michael Peter Christen
7e71dcc417 removed interaction fragments 2014-02-09 12:25:07 +01:00
Michael Peter Christen
94245ce0a8 fixed "Size in KBytes" calculation in PerformanceQueues_p.html,
see http://bugs.yacy.net/view.php?id=362
2014-02-07 17:19:08 +01:00
Michael Peter Christen
726e8c3ad5 removed unused classes and servlets 2014-02-07 01:47:10 +01:00
Michael Peter Christen
6e59ca4ebf removed jena library and all code that depended on jena. When jena was
introduced, it was also used for search facets. The generic search
facets are now deduced from generic solr fields which makes jena as tool
for facet semantics superfluous.
2014-02-07 01:20:06 +01:00
Michael Peter Christen
0e6729f9bc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-07 00:23:50 +01:00
Michael Peter Christen
9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table 2014-02-07 00:22:31 +01:00
Michael Peter Christen
e8bdf16ea7 added statistic information for solr resources in PerformanceMemory 2014-02-07 00:02:19 +01:00
reger
1a2b298a65 fix: select all checkbox Tables_p
(needs form name attribute)
2014-02-06 23:15:00 +01:00
Michael Peter Christen
931541d198 re-inserted default value re-set button to performance queues and
patched missing values for recent new queues
2014-02-06 22:39:19 +01:00
reger
bd1685c94a fix not needed getFileExtension().toLower (double)
add missing .getFileExtension
2014-02-05 03:45:02 +01:00
orbiter
a11f072504 enhanced didyoumean 2014-02-04 00:18:11 +01:00
Michael Peter Christen
bc395c7439 reduced color depth of star icons (for smaller file sizes) 2014-02-03 17:39:59 +01:00
Michael Peter Christen
9e0e39a9a4 small change to start/stop/pause icon style 2014-02-03 17:39:26 +01:00
orbiter
22e3524797 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-03 12:45:35 +01:00
orbiter
c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
instead of computing suggest permutations of the given words, the
completion of a phrase using the given words is searched in the fulltext
index.
2014-02-03 12:44:52 +01:00
reger
ad4b213145 remove unused static var from HTTPDProxyHandler 2014-02-02 03:47:12 +01:00
reger
6c6056836d fix vocabulary navigator checkbox selection (from last commit) 2014-01-31 23:03:01 +01:00
reger
cb71413d19 fix page nav, to keeping modifier
(was new issue)
2014-01-30 22:00:32 +01:00
orbiter
ba5ab11cc4 less logging 2014-01-27 21:54:52 +01:00
Michael Peter Christen
322854a5f8 fix auth for forced ping 2014-01-27 15:56:02 +01:00
Michael Peter Christen
fbf4f77d80 fixed missing corona in network picture 2014-01-27 15:43:08 +01:00
Michael Peter Christen
d2b8f2b477 enhancements for staticIP and ipv6 handling 2014-01-27 13:48:20 +01:00
reger
91d79c1ac4 disable wrong forward to https on port change 2014-01-26 21:50:42 +01:00
reger
193b8235c2 remove double jquery-1.3.1.js and adjust header links to jquery-1.3.2 2014-01-26 00:58:54 +01:00
reger
f307d65dcf prepare for a language navigator
works fine to restrict language for local solrSearches.
More work needs to be done to make rwi/remote searches respect the modifier.language restriction.
2014-01-24 03:11:25 +01:00
orbiter
768b1306b8 Added a write-enabled checkbox for remote solr servers.
It is now possible to assign every peer other YaCy peers as remote solr
server which are only used for read operations during search. This also
affects crawling: it will exclude urls from crawls which exist on remote
solr/remote YaCy peers.
2014-01-23 22:48:31 +01:00
orbiter
f7d6dd136f changed solr paths according to new default paths 2014-01-23 19:21:07 +01:00
Michael Peter Christen
8b14e92ba4 added button in host browser to re-load 404/failed documents 2014-01-23 15:56:36 +01:00
reger
f47067b0ce fix search navigator not showing activated nav
introduced with 97e84439fb
2014-01-23 01:52:51 +01:00
reger
9a96a7d73f put list quick navigator buttons belowon BlackList_p editor
replacing the dropdown -> go navigation
2014-01-21 21:35:48 +01:00
Michael Peter Christen
6ada0daae9 making latency_factor and maximum number of same hosts in loader queue
settings available in Crawler_p.html servlet for steering.
2014-01-21 19:28:00 +01:00
Michael Peter Christen
be5e808236 - removed hardcoded load-test which is now handled in BusyQueues
steering, see /PerformanceQueues_p.html
- changed default values for crawler queue load limit (high, because
these jobs are started upon user request)
2014-01-21 17:48:45 +01:00
sixcooler
40a4030b55 configurable max-load values for YaCy-Threads:
try lower values on smal systems like a Pi
2014-01-21 17:04:22 +01:00
Michael Peter Christen
77531850b5 reverted crawling strategy from latest commit. 2014-01-21 16:05:55 +01:00
Michael Peter Christen
c0da966dfa enhanced crawler speed 2014-01-20 21:46:40 +01:00
Michael Peter Christen
1ea17bd9f3 - removed old metadata database and all migration code
- refactored all code which uses URIMetadataRow as standard for word
hash length and word hash ordering and moved that to the class 'Word',
becuase the class URIMetadataRow defined the old metadata data structure
and should be superfluous in the future
- removed unused methods from URIMetadataRow as preparation for further
removal of that class
2014-01-20 18:31:46 +01:00
reger
97e84439fb adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString
- since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic,
adjusted ConfigHeuristic to use OpensearchHeuristic settings only.
For this the default OSD search target list is made available (copied) by default and the other configs are removed.

- the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object,
but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns
just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers)

- started to adjust internal html href references from absolute to relative (currently it is mixed).
For future development we should prefer relative href targets (less trouble with context aware  servlets)
2014-01-20 00:58:17 +01:00
orbiter
fd4abc0565 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-19 01:50:55 +01:00
orbiter
d5b8e473c8 added load limit for DHT transfer: RWI acceptance only if local load is
not too high
2014-01-19 01:50:42 +01:00
reger
41c126978b fix bug: Crawl Start (Expert) crawls "?-URLs" even if told not to do so
http://bugs.yacy.net/view.php?id=329
2014-01-18 23:27:16 +01:00
Michael Peter Christen
a9ed28c0b5 no commit if no action is requested 2014-01-17 14:54:44 +01:00
reger
0c754dd794 implemented DIGEST authentication, which is for remote login more secure
as BASIC were pwd is transmitted near clear text (B64enc).
This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST.

!!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash
- default authentication is still BASIC
- configuration at this time only manually in (DATA/settings) or  defaults/web.xml  (<auth-method>
- the realmname is in defaults/yacy.init  adminRealm=YaCy-AdminUI
- fyi: the realmname is shown on login screen
- changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin)
- implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST
- to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes (  "MD5:hash" )
2014-01-17 00:02:23 +01:00
Michael Peter Christen
f8ce7040ab remote search peer selection schema change:
- all non-dht targets (previously separated into 'robinson' for dht-like
queries and 'node' for solr queries) are non 'extra' peers, which are
queries using solr
- these extra-peers are now selected using a ranking on last-seen,
peer-tag-matches, node-peer flags, peer age, and link count. The ranking
is done using a weight and a random factor.
- the number of extra peers is 50% of the dht peers
- the dht peers now exclude too young peers to prevent bad results
during strong growth of the network
- the number of dht peers (and therefore extra-peers) is reduced when
the memory of the peer is low and/or some documents still appear in the
indexing-queue. This shall prevent a peer from deadlocks when p2p
queries are made in a fast sequence on weak hardware.
2014-01-16 17:27:14 +01:00
reger
6932aa4d7a use configured admin-username for api calls
- the admin user name can be configured, in apiExec calls the default "admin" username is used. 

TODO: the bin/apicall.sh script should likely take that into account.
2014-01-07 21:26:50 +01:00
reger
c656e67c97 fix: display proper error msg on admin user change 2014-01-07 20:34:37 +01:00
orbiter
2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
as path for solr index dumps (instead of the SEGMENTS path). This will
make a maintenance of index backups easier. It will also provide a tool
to migrate from an freeworld index to a webportal index.
2014-01-07 17:53:49 +01:00
reger
30d925a96e reimplemented server access restriction
via Jetty IPAccessHandler to allow only configured IP's to access.
Handler is only loaded if a restriction is configured.

Since IPAcessHandler (Jetty 8) does not support IPv6 system property java.net.preferIPv4Stack=true
Testing showed system.setProperty seems to be sensitive to point of calling (earliest possible time seems to be best = early in yacy.main).
Moved the "isrunning..." just open browser check also to the new routine to preread the yacy.config only once.
2014-01-06 07:00:16 +01:00
orbiter
3cb6c7861f fixed shutdown authenticaton problem 2014-01-06 01:48:54 +01:00
Michael Peter Christen
7005ecdabd cleanup 2014-01-05 15:06:40 +01:00
Michael Peter Christen
2939b47986 removed non-working realm setting in http client (auth for localhost was
added in previous commit)
2014-01-05 15:04:18 +01:00
Michael Peter Christen
9bd71fdbb4 made the access tracker class static because it shall be used by the
jetty auth module
2014-01-05 05:04:28 +01:00
Michael Peter Christen
7d6fc79eb8 refactoring (usage of constant names for attributes of authentication
check)
2014-01-05 04:23:44 +01:00
reger
cabe0943cd fix opensearch resultcount in yacysearch.rss
see merge request https://gitorious.org/yacy/rc1/merge_requests/24
use result count in searchtrailer.xml which is on p2p search more accurate (timing)
2014-01-04 17:14:10 +01:00
reger
eaf596a257 adding proxy status to (private) status box
(show also transparent and url proxy status)

show search result via url proxy only if status=on
2014-01-04 16:10:54 +01:00
reger
e3d8459906 extend ssl enabled msg on status page
- post the portnr
2014-01-03 02:56:09 +01:00
reger
58ecf5e4dd add to blacklist button in CrawlResults
http://bugs.yacy.net/view.php?id=220
introduced Blacklist.add with sourcefile only parameter
2014-01-01 11:01:22 +01:00
reger
17b454f957 fix external link (open in new tab) 2014-01-01 10:33:20 +01:00
reger
dd8ea0cdd6 fix "add to blacklist" button style in IndexControlRWIs_p
- added default filename filter to select field (as only addition to *.black list is permanent)

- modified Blacklist_p header/legend to show all active blacklists 
  (to support understanding that all configured lists are active)
- removed obsolete code in Blacklist_p servlet
2013-12-30 20:03:59 +01:00
orbiter
2861183359 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-12-29 13:04:14 +01:00
orbiter
4035e20f0b unescaping the path 2013-12-29 13:03:33 +01:00
orbiter
7e21d1ff70 "inaccessible" better describes the state of a server which cannot be
reached (while 30c3: too many users)
2013-12-29 12:40:34 +01:00
reger
7f9b9315fe Merge origin/master 2013-12-29 02:05:07 +01:00
reger
8eaabb9600 remove dependency from old serverCore.java
- remaining getPortNr not needed 
  (as current release allows only to set plain integer as port,
   see ConfigBasic)
2013-12-29 02:00:44 +01:00
orbiter
2018e55f8b switched back on index deletion (was accidently off because new jetty
framework delivers never null to post arguments .. there may be more of
that kind of problems)
2013-12-29 01:39:30 +01:00
orbiter
d4942ad5e0 startRecord fix; this is not according to SRU definition because this
states that the first record has number 0; but +1 is not consistent with
other places where the number is used.
2013-12-28 23:34:43 +01:00
reger
3d913558ab display configured adminUserName in ConfigAccounts_p
- fix read default username in  in loginservice
2013-12-27 21:04:14 +01:00
reger
fbdd89e198 Merge origin/master 2013-12-27 06:53:14 +01:00
reger
65a2f3d5e7 tweak Jetty credentials to work with YaCy UserDB
- user entry in UserDB with admin right can login to access protected pages
- dto. admin user, choosen username is stored in conf (adminAccountUserName=)
2013-12-27 06:45:22 +01:00
Michael Peter Christen
ee17bd0b69 added option to attach remote solr servers in read-only mode 2013-12-27 02:55:21 +01:00
Michael Peter Christen
25f9c35033 add patch which shall prevent that naive search mistakes like usage of
regular expressions cause no results. Usage of '*' followed by a dot or
any expression will now cause that this expression is used as a filetype
search.
2013-12-27 00:34:55 +01:00
reger
e05320b776 upd: to open more external links in new browser-tab 2013-12-26 01:16:53 +01:00
reger
cbb5dc01e4 remove obsolete htroot/solr htroot/gsa YaCy-servlets
- now handled by standard servlets
2013-12-25 22:53:11 +01:00
reger
71cac1a278 added SSL/HTTPS connector to support SSL/https connection on port 8443
!!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used
   - added ping test for above to migration 

- as of now port for https is hardcoded to default 8443
- if not urgend required I'd leave it this way (it's standard) to use different ports for http and https 

- post https port on ConfigBasic.html (if active)
2013-12-25 05:20:13 +01:00
reger
f681ce15ae remove obsolete HTTPServer input field 2013-12-24 05:11:31 +01:00
Michael Peter Christen
20b48f894f refactoring: moving all servlets to the same package (the solr servlet
is currently actually a filter which should be changed somehow)
2013-12-23 01:32:29 +01:00
Michael Peter Christen
84167adb49 removed unused anomichttpd code after migration to jetty 2013-12-23 01:23:40 +01:00
Michael Peter Christen
b461a27abb fixed the SolrServlet 2013-12-20 01:51:51 +01:00
Michael Peter Christen
7603e879dc Merge branch 'master' into HEAD
Conflicts:
	.classpath
	source/net/yacy/cora/federate/solr/SolrServlet.java
2013-12-20 01:19:06 +01:00
Michael Peter Christen
25250405f1 solr servlet preparation for join with jetty branch 2013-12-20 00:45:58 +01:00
reger
c84c313fe1 Merge origin/master into jetty 2013-12-14 20:02:24 +01:00
Michael Peter Christen
74466d731a use pre-compiled patterns in ymark 2013-12-12 11:50:48 +01:00
Michael Peter Christen
09412ea3a4 counting search requests in solr interface 2013-12-12 03:37:19 +01:00
Michael Peter Christen
67e7dc0cc6 added more properties to seedlist servlet 2013-12-06 14:30:47 +01:00
Michael Peter Christen
79771c60c0 IPv6 fixes 2013-12-06 14:30:08 +01:00
reger
92d9c56f9f Merge origin/master into jetty 2013-12-05 22:53:29 +01:00
Michael Peter Christen
da380343c2 perform greedy learning heuristic only if load < 1.0 2013-12-04 22:44:51 +01:00
Michael Peter Christen
81926c055d fixed bug with image search in yacyinteractive 2013-12-04 18:44:23 +01:00
Michael Peter Christen
edda0699e4 changed default timeout for port scanner 2013-12-04 18:13:43 +01:00
Michael Peter Christen
f1b5db2c45 - performance graph does not shop peer ping in memory monitor any more
- after a forced GC, the PerformanceMemory view switches to automatic
update by default
2013-12-04 12:59:30 +01:00
Michael Peter Christen
0db8e34625 enhanced webgraph processing 2013-12-04 01:54:45 +01:00
Michael Peter Christen
9d8b32c63a fixed a division by zero 2013-12-04 01:54:14 +01:00
Michael Peter Christen
957f6297fb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-30 01:46:03 +01:00
reger
effea4bca0 Merge origin/master into jetty
Conflicts:
	source/net/yacy/cora/federate/solr/SolrServlet.java
2013-11-29 22:39:52 +01:00
reger
b49e90d2e9 remove reference to solrServlet from YaCy servlet select
- reference is not used
- solrServlet is used in Jetty branch and adjustments there conflict with unused solrServlet here.
2013-11-29 22:10:14 +01:00
Michael Peter Christen
38e1e3a707 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-29 02:46:38 +01:00
sixcooler
2c2ebb0d92 tried some hardening in order not letting any Solr-Searchers open 2013-11-29 02:40:12 +01:00
Michael Peter Christen
cca79d12ef setting of some default values to make an client development start easy
using the description at
http://www.yacy-websuche.de/wiki/index.php/Dev:APIhello
2013-11-29 01:28:48 +01:00
Michael Peter Christen
3d4b5e66ce disallow remote robots to crawl the HostBrowser servlet 2013-11-26 07:06:25 +01:00
Michael Peter Christen
234ca720f5 only admins should be able to force a commit 2013-11-26 07:03:20 +01:00
Michael Peter Christen
2c39b65409 fixes for searches containing stopwords. The fix was done using a
reconstruction of the search word set access method to protect that
words are deleted from the sets from the outside of the QueryGoal class.
2013-11-26 02:24:47 +01:00
orbiter
61409788eb less word hash computations (removing some overhead because of MD5
calcs) using the clear word in a normalized form.
2013-11-25 15:20:54 +01:00
reger
5c4a3d1c01 Merge origin/master into jetty 2013-11-24 21:00:39 +01:00
Michael Peter Christen
caa20d63d9 fixed seedlist (hash was missing) 2013-11-22 14:15:52 +01:00
Michael Peter Christen
ccf2f4e43b refactoring of seed attributes (introduced more constants) 2013-11-22 14:15:31 +01:00
Michael Peter Christen
c927b428d3 fixed json 2013-11-22 10:07:08 +01:00
Michael Peter Christen
64048ff217 fir for XSS 2013-11-22 09:53:32 +01:00
orbiter
b7f1e5af51 added new servlet which generates the same file as the principal peers
upload to a bootstrap position
 you can call it either with
 http://localhost:8090/yacy/seedlist.html
 or to generate json (or jsonp) with
 http://localhost:8090/yacy/seedlist.json
 http://localhost:8090/yacy/seedlist.json?callback=seedlist
2013-11-19 15:56:10 +01:00
orbiter
3e552550d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-18 22:48:00 +01:00
orbiter
c2d720cdaf purge a lucene cache - possible memory leak fix 2013-11-18 22:47:35 +01:00
reger
f111f30ace Merge origin/master into jetty 2013-11-17 00:18:25 +01:00
Michael Peter Christen
f4172cbb3d fix for another XSS bug 2013-11-17 00:17:25 +01:00
orbiter
ff86cb683f fixed some XSS bugs reported by Marius from http://ctf365.com/ 2013-11-16 20:34:31 +01:00
orbiter
19a051bec8 more monitoring for postprocessing and enhanced layout in Crawler
monitor page
2013-11-16 18:23:14 +01:00
Michael Peter Christen
fceac8cffd more monitoring for postprocessing 2013-11-16 08:23:42 +01:00
Michael Peter Christen
9d5895f643 enhanced and fixed postprocessing 2013-11-15 15:41:12 +01:00
Michael Peter Christen
087df05e24 added option to Config_Network_p.html to enable remote search while
DHT-Receive is switched off.
2013-11-13 13:38:01 +01:00
Michael Peter Christen
1a4a69c226 set more logger to 'final static' 2013-11-13 06:18:48 +01:00
Michael Peter Christen
69b8d61c47 fix for search requests in GSA interface which contain 'funny'
characters (like ':' etc.)
2013-11-12 15:54:54 +01:00
orbiter
4234b0ed6c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-10 18:50:43 +01:00
orbiter
74c86a72a0 better default value for crawler user agent 2013-11-10 18:48:00 +01:00
reger
1437c45383 merge rc1/master 2013-11-07 21:30:17 +01:00
Michael Peter Christen
87a956e881 calculating and showing the number of files and the average size of a
file in the HTCACHE in ConfigHTCache_p.html
2013-11-07 12:13:12 +01:00
Michael Peter Christen
acc1f8a749 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-07 12:01:37 +01:00
Michael Peter Christen
81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
2013-11-07 10:01:44 +01:00
sixcooler
987f410011 URL-export:add query and fix for cast-class-exception 2013-11-06 19:22:26 +01:00
Michael Peter Christen
ffe8276063 replaced referrer link masking to 'pure' links to the referring page
(that was more useful during testing)
2013-11-06 18:05:46 +01:00
reger
b38de92a16 Merge origin/master into jetty 2013-11-02 00:48:42 +01:00
Michael Peter Christen
434e13b46d in host browser also show the properties of failed documents including
referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)
2013-11-01 13:30:53 +01:00
orbiter
1ac504ae51 use html encoding for urls in metadata 2013-10-31 16:16:29 +01:00
reger
f017066197 Merge origin/master into jetty 2013-10-27 15:09:24 +01:00
Michael Peter Christen
25951cee14 - fixed opensearchdescription, this delivered an url with missing
'global' option
- added display=2 to compare_yacy to remove the superfluous border
2013-10-26 00:34:55 +02:00
Michael Peter Christen
f1bfe64361 integrated startpage to compare_yacy 2013-10-26 00:33:36 +02:00
Michael Peter Christen
2f57327f20 added boolean load property to CacheResource_p servlet which causes that
the servlet loads the page from the web.
2013-10-26 00:15:25 +02:00
Michael Peter Christen
9bb7eab389 hacks to prevent storage of data longer than necessary during search and
some speed enhancements. This should reduce the memory usage during
heavy-load search a bit.
2013-10-25 15:05:30 +02:00
Michael Peter Christen
5afa6e3aee Automatically flush the log cache if a short memory status is reached.
For the default of 200 lines this can flush about 10MB.
2013-10-24 17:39:50 +02:00
Michael Peter Christen
030d0776ff Enhanced crawl start for very, very large crawl lists (i.e. > 5000)
which had a problem because of badly used concurrency.
This fix also caused a redesign of the whole host deletion process.
This should fix bug http://bugs.yacy.net/view.php?id=250
2013-10-24 16:20:20 +02:00
Michael Peter Christen
4948c39e48 added concurrency for mass crawl check 2013-10-23 11:27:19 +02:00
Michael Peter Christen
1b4fa2947d - fixed a problem which ocurred when a document was not recognized with
the right content domain (i.e. identifying that it is an image, text
etc.) because it used the file extension and not an existing mime type
assignment.
- fixed the new setting that images shall be loaded for a better image
search.
- both fixes together makes it now possible to crawl
commons.wikimedia.org which makes use of 'funny' document names (i.e.
ending with .jpg while the document is html)
2013-10-23 00:16:54 +02:00
Michael Peter Christen
16e3b357b3 replaced old tag cloud and adopted design a bit 2013-10-22 14:20:17 +02:00
Michael Peter Christen
dc38d35986 added matching in url field in Table_API_p search 2013-10-22 12:46:10 +02:00
Michael Peter Christen
691d7e70fa added hint to development/commit rss feed 2013-10-21 15:16:29 +02:00
Michael Peter Christen
b81859c751 Show a RSS icon in the right top corner of search results. This replaces
the 'API' icon which was the link for the opensearch result which is an
extension of RSS. Since it is more appropriate to visualize a RSS link
with an RSS icon, this API icon was changed here.
2013-10-21 15:10:58 +02:00
Michael Peter Christen
1a09771be8 fixed sitemap crawl start 2013-10-21 12:49:32 +02:00
orbiter
b743e6d79f - prevent that crawl filter have empty (never-match) content
- rewrite the description of the options "Restrict to start domain(s)"
and "Restrict to sub-path(s)" to an explanation, that the restriction
applies to all links in the link list of the option "From Link-List of
URL" if this option is selected
- allow "Restrict to sub-path(s)" if the "From Link-List of URL" is
selected. This is supported in the crawl start.
2013-10-18 14:14:13 +02:00
orbiter
f597fdb602 make it easier to filter properties (case insensitive) 2013-10-17 18:36:35 +02:00
reger
f46c723398 allow to choose used http server, YaCy-Anomic or Jetty
- defaults to Jetty (in this branch)
- add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking
2013-10-17 03:34:22 +02:00
reger
1adb4b8741 merge rc1/master 2013-10-16 03:02:21 +02:00
reger
37d24f3318 make use of declared static string ACTION_LOCATION 2013-10-16 02:25:39 +02:00
reger
eea504c117 update Info.plist
small DefaultServlet refactoring
2013-10-12 23:01:14 +02:00
reger
a44eede8b8 merge rc1/master 2013-10-11 01:50:25 +02:00
reger
54a0272338 searchpage javascript (latestinfo) causes reset of search statistic after moving to next page
- disabled call via setTimeout in yacysearch.html
2013-10-10 23:23:58 +02:00
Michael Peter Christen
91fa99e9bb added new icon/image for latest commit 2013-10-09 22:07:59 +02:00
Michael Peter Christen
9fac9249bc - replaced 'edit' link with a clone symbol in Table_API_p since that is
what it does: it clones the crawl, it does not change the crawl.
- moved the appearance of this clone link to the type column since this
makes it visible also if the URL column is not visible.
2013-10-09 22:07:32 +02:00
Michael Peter Christen
0f6db6ad5b Merge remote-tracking branch 'jensbees/crawlexpert-post' 2013-10-09 21:32:27 +02:00
Jens Bertram
3252c1ec39 Merge upstream/master into crawlexpert-post 2013-10-09 20:49:14 +02:00
Michael Peter Christen
90c8577840 enhanced ranking; patches to replace old ranking 2013-10-09 15:10:03 +02:00
bhoerdzn
a3824dfbaa check URL on inital load, if set 2013-10-09 13:52:44 +02:00
bhoerdzn
52f49d475b add a hidden field for "crawlingstart" since jQuery omits the submit button value 2013-10-09 13:38:20 +02:00
bhoerdzn
b0c0ec2dec link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler" 2013-10-09 12:55:42 +02:00
bhoerdzn
d64d45361c use integer types for boolean values 2013-10-09 12:42:04 +02:00
bhoerdzn
eda123d6fd remove debugging code intercepting post requests 2013-10-09 11:51:07 +02:00
bhoerdzn
5057f27bbd fix typo in parsing "cachePolicy" parameter 2013-10-09 11:41:15 +02:00
bhoerdzn
98f5c9018d Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load. 2013-10-09 11:32:17 +02:00
bhoerdzn
a6a62986d4 correct state handling for country code restriction 2013-10-09 10:42:35 +02:00
bhoerdzn
4066b85155 correctly set initial state for load filters 2013-10-09 10:36:08 +02:00
bhoerdzn
8c91c3e7cd set form boolean values to 0 & 1 instead of false & true 2013-10-09 10:05:51 +02:00
bhoerdzn
c27fabc88e fixed wrong parameter check 2013-10-09 10:00:16 +02:00
bhoerdzn
2214bf5396 Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation. 2013-10-09 09:48:00 +02:00
reger
71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
     - putting the version specific code in classes starting with Jetty8xxxx
     - moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets

- adjust other test cases/classes
2013-10-09 00:40:48 +02:00
orbiter
705b3338ee list more fields available for search and for ranking boosts 2013-10-08 18:15:35 +02:00
bhoerdzn
405878182f Use list template for all other option lists. Fixed some template expressions. 2013-10-08 15:04:31 +02:00
bhoerdzn
8e74098cd4 Use list template for "reloadIfOlderNumber". 2013-10-08 13:26:09 +02:00
bhoerdzn
52bad7b908 Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields. 2013-10-08 13:24:27 +02:00
Michael Peter Christen
e56aa4fe93 fixed search navigation 2013-10-07 23:51:08 +02:00
Michael Peter Christen
4fbc4740df removed warnings 2013-10-07 23:41:50 +02:00
bhoerdzn
45cf553bc3 try to guess default crawling mode, if none set 2013-10-07 13:13:22 +02:00
bhoerdzn
b4f0c822f2 assign strings before checking contents 2013-10-07 13:01:39 +02:00
bhoerdzn
499abe8f91 set default values for string parameters 2013-10-07 12:32:23 +02:00
bhoerdzn
42ea56eaad made crawStartExpert_p aware of post variables; extended template where needed 2013-10-07 11:25:59 +02:00
reger
c7c706fd9f merge with rc1/master 2013-09-30 03:46:39 +02:00
Michael Peter Christen
82bfd9e00a - crawl profiles shall be deleted from active and passive stacks if they
are deleted to terminate the crawl because otherwise the crawl will go
on after the load-from-passive stack policy.
- better check if a crawl is terminated using the loader queue.
2013-09-26 10:22:31 +02:00
orbiter
8ac2e8c8c9 added location navigator which causes that the image to the map search
is visible whenever a location is available in the search result.
To activate this, the search.navigation property in yacy.conf must be
modified to the new default values.
2013-09-24 11:26:51 +02:00
orbiter
d86d2be5c3 automatically removed Places autotagging if no location library is
wanted
2013-09-24 11:23:45 +02:00
reger
5c4ba9b5db merge rc1 master 2013-09-22 02:21:24 +02:00
reger
70c51775ae Merge remote-tracking branch 'origin/master' into jetty 2013-09-22 02:09:02 +02:00
orbiter
d2effd21db fix for npe during location search 2013-09-21 21:03:58 +02:00
Michael Peter Christen
e40671ddb7 better and consistent deletions for error urls 2013-09-17 15:52:57 +02:00
Michael Peter Christen
2602be8d1e - removed ZURL data structure; removed also the ZURL data file
- replaced load failure logging by information which is stored in Solr
- fixed a bug with crawling of feeds: added must-match pattern
application to feed urls to filter out such urls which shall not be in a
wanted domain
- delegatedURLs, which also used ZURLs are now temporary objects in
memory
2013-09-17 15:27:02 +02:00
Michael Peter Christen
61c5e40687 - replaced the properties object in AnchorURL with distinct variables
for anchor attributes.
- this caused that large portions of the parser code had to be adopted
as well
- added a counter target_order_i for anchor links in webgraph
computation
2013-09-15 23:27:04 +02:00
Michael Peter Christen
5e31bad711 - the webgraph shall store all links which appear on a web page and not
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
2013-09-15 00:30:23 +02:00
reger
13fc86c960 Merge remote-tracking branch 'origin/master' into jetty 2013-09-14 21:10:24 +02:00
reger
127adbf5cf remove references to 10_http thread (legacy http server)
and add needed get/set function to jetty http server wrapper
2013-09-12 22:02:11 +02:00
Michael Peter Christen
3e22d05290 added option for daterange properties in GSA interface to use an left-
or right-open date range;
i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional
to daterange=2013-09-02..2013-09-09
2013-09-11 12:52:18 +02:00
reger
36b7159282 - remove double initialization of jetty
- refactor some var assignments
2013-09-11 02:24:47 +02:00
reger
63ed04260a Merge remote-tracking branch 'origin/master' into jetty 2013-09-10 20:42:38 +02:00
Michael Peter Christen
35ab2cef7b added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in
html meta fields to get a correct (or: better) date timestamp. The
http:last-modified mostly does not work because it is set to the current
date from most CMS.
2013-09-10 10:31:57 +02:00
reger
aafef72a8a merged current rc1/master into jetty branch to allow further development with latest version
ServerSideIncludes and servlet return values need further work (for working jetty integration)
- TODO: added nasty quickfix to allow SSI -  needs further work
- TODO: YaCy servlet return values/parameters are not handled
2013-09-09 02:36:06 +02:00
Michael Peter Christen
dbef8ccfcb forced deletion of ZURL entries for a specific host for each host that
appears in the crawl url list
2013-09-05 13:22:16 +02:00
Michael Peter Christen
e137ff4171 refactoring (im preparation for new removeHost method) 2013-09-05 09:59:41 +02:00
Michael Peter Christen
9e12fdff23 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-09-03 12:22:57 +02:00
Michael Peter Christen
049c3b3f2e added an option to exclude image search results from text search. This
is on by default.
2013-09-03 11:14:23 +02:00
Michael Peter Christen
5d71a4c8bc fix for dc:description field 2013-09-03 07:54:49 +02:00
reger
392174de8c remove all_words, all_strings lists from QueryGoal
- only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only
2013-09-02 23:09:43 +02:00
Michael Peter Christen
cb85b22725 redesign of the image search process (with much better results,
unfortunately the index schema has changed and p2p image search will not
be muchmuch better until many people update)
2013-09-02 18:55:38 +02:00
Michael Peter Christen
6184fd9d9a fix for solr/gsa result logging 2013-09-02 08:05:42 +02:00
reger
29967102a2 optimized QueryGoal (reducing mem and computation by removing all_hashes)
- all_hashes used for text highlighting and word distance computation which can be done with include_hashes only
2013-09-02 04:19:53 +02:00
orbiter
f106345eef link strings should not be tokenized 2013-09-01 14:35:36 +02:00
orbiter
5b14bdfffd npe fix 2013-09-01 13:28:37 +02:00
orbiter
1ca4b9612c added special handling of the BinaryResponseWriter in the solr interface
which makes it possible to use solrj with the javabin format which is
much better (compressed, no xml overhead, java object streams) and
faster. Furthermore, this enables the 'shards' option in the solr
interface which connects one solr (YaCy) to another solr (YaCy) ad-hoc.
2013-09-01 13:11:40 +02:00
Michael Peter Christen
a88a62f7aa added a feature to set a collection for a crawl result based on a
regular expression on th url: the collection attribut for a crawl start
may be now either a token or a list of tokens, seperated by ',' where a
token is either a string or a pair <string,pattern> where the string is
separated to the pattern with a ':' and the string is assigned to the
document as collection only if the pattern matches with the url.
2013-08-25 00:13:48 +02:00
Michael Peter Christen
765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
in intranets and the internet can now choose to appear as Googlebot.
This is an essential necessity to be able to compete in the field of
commercial search appliances, since most web pages are these days
optimized only for Google and no other search platform any more. All
commercial search engine providers have a built-in fake-Google User
Agent to be able to get the same search index as Google can do. Without
the resistance against obeying to robots.txt in this case, no
competition is possible any more. YaCy will always obey the robots.txt
when it is used for crawling the web in a peer-to-peer network, but to
establish a Search Appliance (like a Google Search Appliance, GSA) it is
necessary to be able to behave exactly like a Google crawler.
With this change, you will be able to switch the user agent when portal
or intranet mode is selected on per-crawl-start basis. Every crawl start
can have a different user agent.
2013-08-22 14:23:47 +02:00
Michael Peter Christen
47b1c81d08 - refactoring
- generalized writing of url attributes to solr documents
- added more url attributes to error documents
2013-08-20 15:46:04 +02:00
Michael Peter Christen
e6b423c4d9 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-08-19 22:02:41 +02:00
reger
94bec24d14 add back menu to Surftips page (currently no menu is displayed) 2013-08-19 17:53:37 +02:00
Michael Peter Christen
1f299b0d42 removed link.gif as link button because this image is now shown
automatically for expernal links
2013-08-19 10:54:23 +02:00
Michael Peter Christen
48ddd50a6c html fix 2013-08-17 09:32:24 +02:00
reger
96ae332427 revert del _blank (last commit) in template 2013-08-15 00:15:01 +02:00
reger
43348a98a9 add some href target=_blank to ext. links with external icon 2013-08-15 00:05:32 +02:00
reger
82d81a57bd info msg if no embedded Solr http://bugs.yacy.net/view.php?id=279 2013-08-14 20:59:46 +02:00
reger
02fe8b43ba Field Re-Indexing: display list of fields in reindex queue
change servlet to display statistic on 1st click (instead after refresh)
2013-08-11 04:51:29 +02:00
sixcooler
7f501b7c38 clear some caches before reporting low Memory
do not break lines in Network-table-rows
2013-08-08 14:38:26 +02:00
reger
070bf85b33 css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit 112836dcc9) 2013-08-04 05:37:20 +02:00
sixcooler
8a96140f92 fix / workaround for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750
+ Seed.hash should be final
2013-08-01 16:40:58 +02:00
Michael Peter Christen
2674d28ef4 protection against self-ping (may be cause by fraud attempts) 2013-08-01 12:35:44 +02:00
orbiter
f3d001c7ab more space in the about section 2013-08-01 11:49:07 +02:00
Michael Peter Christen
e879b97b0a added line to enhance debugging 2013-07-31 13:33:05 +02:00
Michael Peter Christen
76afcccaaf fix for default boolean post values: the default value MUST NOT be TRUE,
because it's normal that a boolean value is missing in the post argument
if a checkbox is not selected.
Added also some style enhancements to IndexFederated, removed the Solr
attachment manual and replaced it with a link to the wiki which explains
this in more detail.
2013-07-31 10:49:26 +02:00
orbiter
252c525709 fixed feed api servlet and and enhanced RSSReader class 2013-07-31 06:18:30 +02:00
Marc Nause
112836dcc9 Improved external links.
*) image links will not be marked (if they have class "yacylogo" or
"forceNoExternalIcon")
*) external links in menu on left (and "fork me"-banner) will open in
new tab/window now
2013-07-30 21:40:37 +02:00
Marc Nause
d64a094f0e External links in HTML interface are marked as external with small icon.
*) added new icon
*) added CSS rules to mark all external links except search results
(target="_self")
2013-07-30 20:46:51 +02:00
Michael Peter Christen
58fe986cca Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-30 12:49:14 +02:00
Michael Peter Christen
cf12835f20 replaced the single-text description solr field with a multi-value
description_txt text field
2013-07-30 12:48:57 +02:00
sixcooler
7d53ac86a3 fix for Blacklist (-Administration) 2013-07-29 19:09:28 +02:00
orbiter
f425b2c61c re-try to fetch url after a soft commit 2013-07-27 10:56:02 +02:00
orbiter
bf0ad04e1b apply load limitation also to dht-in 2013-07-27 10:42:38 +02:00
Roland Haeder
b58ca8622d Some cleanups:
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
2013-07-27 10:13:57 +02:00
Roland Haeder
e2ee412160 Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS'
Conflicts:
	htroot/api/blacklists_p.java
2013-07-27 10:12:58 +02:00
Roland Haeder
ae19401af0 Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:59:09 +02:00
Roland Haeder
59225487ea Fix for blacklist export, also applied the filename filter here 2013-07-27 09:58:56 +02:00
Roland Haeder
952fc0e7bd Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block 2013-07-27 09:58:38 +02:00
Roland Haeder
060fec1577 Reuse Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:57:50 +02:00
Roland Haeder
29049c71f5 Possible fix for ticket http://bugs.yacy.net/view.php?id=270, the filter for only including *.black must be applied 2013-07-27 09:57:07 +02:00
Michael Peter Christen
4c242f9af9 always use a default value for boolean options to have transparency for
the outcome if the attribute is missing in servlets
2013-07-25 12:17:29 +02:00
orbiter
9c681cc00d added segment sizes, postprocessing status and cpu load to crawler
monitor
2013-07-23 19:10:11 +02:00
orbiter
86b514cf46 added load info to status_p.xml 2013-07-23 18:20:07 +02:00
orbiter
056b42f5aa - added information about segment count to status_p.xml
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
2013-07-23 18:03:33 +02:00
orbiter
6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 2013-07-23 16:46:44 +02:00
orbiter
e24016e30a added the property federated.service.solr.indexing.timeout to yacy.init
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
2013-07-22 17:45:12 +02:00
orbiter
232100301c removed double-ocurring value assignments 2013-07-17 19:09:25 +02:00
Roland Haeder
aaedc0405d Fixes and avoid of catching bad exceptions (some):
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
  already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call

Conflicts:
	source/net/yacy/repository/LoaderDispatcher.java
2013-07-17 18:37:34 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Felix Ableitner
376f9cd9d0 Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure 2013-07-17 15:58:09 +02:00
Michael Peter Christen
89c0aa0e74 added collection_sxt to error documents 2013-07-17 15:20:56 +02:00
Michael Peter Christen
0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-17 12:42:06 +02:00
Michael Peter Christen
1fd006cc56 fixes using the embedded connector 2013-07-17 12:41:54 +02:00
orbiter
aba7cc5de7 added cpu load information to status page 2013-07-17 12:38:12 +02:00
Roland Haeder
59b4fdd5ad Merge remote-tracking branch 'upstream/master' 2013-07-13 15:12:51 +02:00
orbiter
5493389576 stealth mode shall only be available for authorized users, because
unauthorized users can otherwise be monitored by authorized users
2013-07-13 14:49:36 +02:00
Roland Haeder
ebbb3bc5c1 Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet 2013-07-13 13:12:36 +02:00
Michael Peter Christen
bcc623a843 refactoring of load_delay: this is a matter of client identification 2013-07-12 16:24:56 +02:00
orbiter
2be456e7fb added a postprocessing field into api/status_p.xml to show if the
postprocessing task is running at that time (status: busy) or not
(status:idle)
2013-07-12 14:29:22 +02:00
orbiter
575f913154 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-12 14:17:13 +02:00
orbiter
c4efb612e2 added list of crawls to status_p.xml 2013-07-12 14:16:51 +02:00
Lotus
bb6caa346c Do not allow automatic update in case YaCy is installed to the Program
Files folder on Windows. There are no permissions to write that folder
and update would fail.
2013-07-11 21:50:06 +02:00
orbiter
dac88561ae minimum access time has a tight connection to ClientIdentification,
therefore it is defined there.
2013-07-11 17:04:24 +02:00
Felix Ableitner
a020697d64 Fixed problems with blacklist entry insertion. 2013-07-11 13:10:23 +02:00
sixcooler
bff8c753c6 re-insert this file - was deleted by mistake
+ correct an other case-typo
2013-07-10 18:32:12 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
orbiter
c79f687110 enhanced the network scanner: find more hosts automatically by removal
of common subdomains before application of protocol-specific prefix
2013-07-09 11:42:13 +02:00
orbiter
b4677d1cad fix for bug #252
the naming of the servlet was wrong, the bug may not be present on
systems where upper/lowercase matching is lazy (windows)
2013-07-09 10:50:47 +02:00
Michael Peter Christen
07261fe274 Merge remote-tracking branch 'nutomics/blacklist_structure' 2013-07-08 23:32:15 +02:00
Michael Peter Christen
dea71851d2 - better concurrency for network scanner
- network scanner can now start from the list of all hosts in the search
index
2013-07-08 16:29:30 +02:00
orbiter
9f0cc9b401 enhanced network scanner
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
2013-07-08 13:17:09 +02:00
orbiter
f8c28efd66 fix for rssTerminal coloring 2013-07-04 21:46:46 +02:00
Felix Ableitner
44f8fcf62e Changed class structure of Blacklist. 2013-07-04 18:37:57 +02:00
Michael Peter Christen
3054a6d4b9 added a patch from Sebastian M.B., submitted by email for coloring of
rss terminal
2013-07-04 17:12:19 +02:00
Michael Peter Christen
78af998f8f Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594' 2013-07-04 16:56:54 +02:00
Michael Peter Christen
57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by
default.
2013-07-03 14:50:06 +02:00
Felix Ableitner
fd90fcc4e0 Fixes #196. 2013-07-02 20:45:41 +02:00
Michael Peter Christen
f1c5338210 prepartion for greedy crawl profiles and refactoring 2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474 adding the canonical tag to crawl queues 2013-07-01 13:09:41 +02:00
Michael Peter Christen
203921006a redesign of citation index storage 2013-06-30 02:11:46 +02:00
Michael Peter Christen
e92b9275ce Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:33:29 +02:00
Michael Peter Christen
56cdcfa2fa fixed greedy learning mode - global is not a search attribute in
searchitems
2013-06-28 15:33:19 +02:00
Michael Peter Christen
32aa1d4569 removed unused option for queries 2013-06-28 15:32:36 +02:00
Michael Peter Christen
0c5bed7e2c added configuration option for greedy learning function to ConfigPortal
servlet
2013-06-28 15:31:36 +02:00
sixcooler
5d1f619f07 possible helpful closing of solr-requests 2013-06-28 15:19:50 +02:00
Michael Peter Christen
9d291764d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:03:25 +02:00