Commit Graph

4579 Commits

Author SHA1 Message Date
reger
070bf85b33 css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit 112836dcc9) 2013-08-04 05:37:20 +02:00
sixcooler
8a96140f92 fix / workaround for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750
+ Seed.hash should be final
2013-08-01 16:40:58 +02:00
Michael Peter Christen
2674d28ef4 protection against self-ping (may be cause by fraud attempts) 2013-08-01 12:35:44 +02:00
orbiter
f3d001c7ab more space in the about section 2013-08-01 11:49:07 +02:00
Michael Peter Christen
e879b97b0a added line to enhance debugging 2013-07-31 13:33:05 +02:00
Michael Peter Christen
76afcccaaf fix for default boolean post values: the default value MUST NOT be TRUE,
because it's normal that a boolean value is missing in the post argument
if a checkbox is not selected.
Added also some style enhancements to IndexFederated, removed the Solr
attachment manual and replaced it with a link to the wiki which explains
this in more detail.
2013-07-31 10:49:26 +02:00
orbiter
252c525709 fixed feed api servlet and and enhanced RSSReader class 2013-07-31 06:18:30 +02:00
Marc Nause
112836dcc9 Improved external links.
*) image links will not be marked (if they have class "yacylogo" or
"forceNoExternalIcon")
*) external links in menu on left (and "fork me"-banner) will open in
new tab/window now
2013-07-30 21:40:37 +02:00
Marc Nause
d64a094f0e External links in HTML interface are marked as external with small icon.
*) added new icon
*) added CSS rules to mark all external links except search results
(target="_self")
2013-07-30 20:46:51 +02:00
Michael Peter Christen
58fe986cca Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-30 12:49:14 +02:00
Michael Peter Christen
cf12835f20 replaced the single-text description solr field with a multi-value
description_txt text field
2013-07-30 12:48:57 +02:00
sixcooler
7d53ac86a3 fix for Blacklist (-Administration) 2013-07-29 19:09:28 +02:00
orbiter
f425b2c61c re-try to fetch url after a soft commit 2013-07-27 10:56:02 +02:00
orbiter
bf0ad04e1b apply load limitation also to dht-in 2013-07-27 10:42:38 +02:00
Roland Haeder
b58ca8622d Some cleanups:
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
2013-07-27 10:13:57 +02:00
Roland Haeder
e2ee412160 Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS'
Conflicts:
	htroot/api/blacklists_p.java
2013-07-27 10:12:58 +02:00
Roland Haeder
ae19401af0 Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:59:09 +02:00
Roland Haeder
59225487ea Fix for blacklist export, also applied the filename filter here 2013-07-27 09:58:56 +02:00
Roland Haeder
952fc0e7bd Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block 2013-07-27 09:58:38 +02:00
Roland Haeder
060fec1577 Reuse Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:57:50 +02:00
Roland Haeder
29049c71f5 Possible fix for ticket http://bugs.yacy.net/view.php?id=270, the filter for only including *.black must be applied 2013-07-27 09:57:07 +02:00
Michael Peter Christen
4c242f9af9 always use a default value for boolean options to have transparency for
the outcome if the attribute is missing in servlets
2013-07-25 12:17:29 +02:00
orbiter
9c681cc00d added segment sizes, postprocessing status and cpu load to crawler
monitor
2013-07-23 19:10:11 +02:00
orbiter
86b514cf46 added load info to status_p.xml 2013-07-23 18:20:07 +02:00
orbiter
056b42f5aa - added information about segment count to status_p.xml
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
2013-07-23 18:03:33 +02:00
orbiter
6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 2013-07-23 16:46:44 +02:00
orbiter
e24016e30a added the property federated.service.solr.indexing.timeout to yacy.init
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
2013-07-22 17:45:12 +02:00
orbiter
232100301c removed double-ocurring value assignments 2013-07-17 19:09:25 +02:00
Roland Haeder
aaedc0405d Fixes and avoid of catching bad exceptions (some):
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
  already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call

Conflicts:
	source/net/yacy/repository/LoaderDispatcher.java
2013-07-17 18:37:34 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Felix Ableitner
376f9cd9d0 Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure 2013-07-17 15:58:09 +02:00
Michael Peter Christen
89c0aa0e74 added collection_sxt to error documents 2013-07-17 15:20:56 +02:00
Michael Peter Christen
0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-17 12:42:06 +02:00
Michael Peter Christen
1fd006cc56 fixes using the embedded connector 2013-07-17 12:41:54 +02:00
orbiter
aba7cc5de7 added cpu load information to status page 2013-07-17 12:38:12 +02:00
Roland Haeder
59b4fdd5ad Merge remote-tracking branch 'upstream/master' 2013-07-13 15:12:51 +02:00
orbiter
5493389576 stealth mode shall only be available for authorized users, because
unauthorized users can otherwise be monitored by authorized users
2013-07-13 14:49:36 +02:00
Roland Haeder
ebbb3bc5c1 Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet 2013-07-13 13:12:36 +02:00
Michael Peter Christen
bcc623a843 refactoring of load_delay: this is a matter of client identification 2013-07-12 16:24:56 +02:00
orbiter
2be456e7fb added a postprocessing field into api/status_p.xml to show if the
postprocessing task is running at that time (status: busy) or not
(status:idle)
2013-07-12 14:29:22 +02:00
orbiter
575f913154 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-12 14:17:13 +02:00
orbiter
c4efb612e2 added list of crawls to status_p.xml 2013-07-12 14:16:51 +02:00
Lotus
bb6caa346c Do not allow automatic update in case YaCy is installed to the Program
Files folder on Windows. There are no permissions to write that folder
and update would fail.
2013-07-11 21:50:06 +02:00
orbiter
dac88561ae minimum access time has a tight connection to ClientIdentification,
therefore it is defined there.
2013-07-11 17:04:24 +02:00
Felix Ableitner
a020697d64 Fixed problems with blacklist entry insertion. 2013-07-11 13:10:23 +02:00
sixcooler
bff8c753c6 re-insert this file - was deleted by mistake
+ correct an other case-typo
2013-07-10 18:32:12 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
orbiter
c79f687110 enhanced the network scanner: find more hosts automatically by removal
of common subdomains before application of protocol-specific prefix
2013-07-09 11:42:13 +02:00
orbiter
b4677d1cad fix for bug #252
the naming of the servlet was wrong, the bug may not be present on
systems where upper/lowercase matching is lazy (windows)
2013-07-09 10:50:47 +02:00
Michael Peter Christen
07261fe274 Merge remote-tracking branch 'nutomics/blacklist_structure' 2013-07-08 23:32:15 +02:00
Michael Peter Christen
dea71851d2 - better concurrency for network scanner
- network scanner can now start from the list of all hosts in the search
index
2013-07-08 16:29:30 +02:00
orbiter
9f0cc9b401 enhanced network scanner
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
2013-07-08 13:17:09 +02:00
orbiter
f8c28efd66 fix for rssTerminal coloring 2013-07-04 21:46:46 +02:00
Felix Ableitner
44f8fcf62e Changed class structure of Blacklist. 2013-07-04 18:37:57 +02:00
Michael Peter Christen
3054a6d4b9 added a patch from Sebastian M.B., submitted by email for coloring of
rss terminal
2013-07-04 17:12:19 +02:00
Michael Peter Christen
78af998f8f Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594' 2013-07-04 16:56:54 +02:00
Michael Peter Christen
57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by
default.
2013-07-03 14:50:06 +02:00
Felix Ableitner
fd90fcc4e0 Fixes #196. 2013-07-02 20:45:41 +02:00
Michael Peter Christen
f1c5338210 prepartion for greedy crawl profiles and refactoring 2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474 adding the canonical tag to crawl queues 2013-07-01 13:09:41 +02:00
Michael Peter Christen
203921006a redesign of citation index storage 2013-06-30 02:11:46 +02:00
Michael Peter Christen
e92b9275ce Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:33:29 +02:00
Michael Peter Christen
56cdcfa2fa fixed greedy learning mode - global is not a search attribute in
searchitems
2013-06-28 15:33:19 +02:00
Michael Peter Christen
32aa1d4569 removed unused option for queries 2013-06-28 15:32:36 +02:00
Michael Peter Christen
0c5bed7e2c added configuration option for greedy learning function to ConfigPortal
servlet
2013-06-28 15:31:36 +02:00
sixcooler
5d1f619f07 possible helpful closing of solr-requests 2013-06-28 15:19:50 +02:00
Michael Peter Christen
9d291764d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:03:25 +02:00
sixcooler
e5abccdfe4 added optimize-option 2013-06-28 14:51:37 +02:00
Michael Peter Christen
8ea6ddf636 removed attributes from ConfigPortal.html which are redundant to
ConfigSearchPage_p.html
2013-06-28 14:17:14 +02:00
Michael Peter Christen
64140f35cd fix for solr requests if no query part is given (prevent npe) 2013-06-28 13:16:25 +02:00
Michael Peter Christen
23fb458963 - fix to gsa searchresult answer in case that no query part is given
- fix to gsa default number of results (is 'num')
2013-06-28 12:22:33 +02:00
Michael Peter Christen
660a196989 refactoring 2013-06-26 09:27:22 +02:00
Michael Peter Christen
54024958ac added url_file_name_s in qeury for live-search of urls 2013-06-25 16:36:05 +02:00
Michael Peter Christen
16d1d744fa added url_file_name_s in default collection schema for the file name
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.

The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
2013-06-25 16:27:20 +02:00
Michael Peter Christen
f542cf7d9c fix for daterange: the to-date is inclusive 2013-06-21 15:47:12 +02:00
Michael Peter Christen
c36720d45f added daterange option to gsa api 2013-06-18 16:25:00 +02:00
Michael Peter Christen
4e3007f4a0 typo 2013-06-13 22:40:46 +02:00
Michael Peter Christen
2cb6b6bc21 added target="_blank" to shutdown links 2013-06-13 22:31:39 +02:00
orbiter
c8e94ad7c7 fix for citation search in case that the citation is very fresh 2013-06-13 18:27:57 +02:00
orbiter
57dcf68665 added a feed-back message inside the shutdown page 2013-06-13 14:44:47 +02:00
Michael Peter Christen
0600d510e1 show the citation report also in ViewFile 2013-06-13 13:22:43 +02:00
Michael Peter Christen
1a92b61d69 fixed usage of ViewFile which needs a commit before showing latest crawl
result pages.
2013-06-13 13:08:24 +02:00
Michael Peter Christen
570511f3c8 removed fields references_internal_id_sxt and
references_internal_url_sxt because they had been shown to be
superfluous. The citation of referrer in the host browser is possible
without them. Therefore now the host browser does not only show
internal, but also external referrer to each link.
2013-06-13 13:01:28 +02:00
Michael Peter Christen
fd1776a3b0 added a new 'Citations' function: each search result item can now be
explored for citations within other documents. A click on the
'Citations' link shows an analysis with all text lines in the document
each with a complete list of documents which contain the same line. A
second section shows the linking documents in ascending order of number
of citations from the original document. Because documents from
different hosts are most interesting here, they are listed at the top of
the page as possible 'copypasta' source.
2013-06-12 15:02:49 +02:00
Michael Peter Christen
1762911f57 added synchronizations and timeouts in solr api; missing
synchronizations in index modification methods causes deadlocks inside
solr.
2013-06-12 02:13:18 +02:00
Michael Peter Christen
2fd7bbb450 reduced load on solr; no seed update in Status and no exists-check in
HTTPLoader in case of redirects, that can be done using the htcache.
2013-06-12 00:14:55 +02:00
Michael Peter Christen
7ee71c2354 changed administration page headline to 'admnistration' 2013-06-12 00:12:04 +02:00
Michael Peter Christen
efd973d29d changed p2p/stealth mode text and links a bit 2013-06-11 16:50:34 +02:00
Michael Peter Christen
6115bef335 added a 'greedy learning' mechanismn which will cause that a 'fresh'
yacy will load linked web pages from search results until the total
number of web pages reaches 15000. This shall give fresh peers a 'boost'
to get faster a personalized search index.
2013-06-11 14:42:30 +02:00
Michael Peter Christen
a5e328d7c5 new icons 2013-06-11 13:16:46 +02:00
Michael Peter Christen
b85db72a73 added another response writer which can present search result with
texts, separated by sentences. Then, these sentences can be used to
search again in the index for the same sentence. This can be used to
provide a tool for plagiarism-search. (not finished yet).
Try the following:
http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml
.. to search for 'flut' and show only sentences in the result documents
which contain the word 'wasser'.
Consider this like using a grep-tool on documents: you select the
documents by a search query and you grep sentences inside the found
documents with the 'grep' attribute.
2013-06-10 18:41:00 +02:00
Michael Peter Christen
5132bf719c added new buttons to search result page in p2p mode which show the
switch between p2p search and the 'stealth mode' which is simply a
non-p2p search within the p2p network. The functionality was there all
the time, but the switch to this was not very visible.
2013-06-10 16:22:00 +02:00
orbiter
2b320313d9 replaced yacydoc servlet usage by a solr result output using an html
output writer. This made the creation of a html result writer necessary
which is included in this commit. The yacydoc servlet was used to
present all metadata to a document, but the solr interface can serve for
this purpose in a much better way. All usages (instead one) of yacydoc
were replaced by a solr call. This affects also the 'metadata' link
attached to search results.
2013-06-09 12:12:34 +02:00
orbiter
200769d0c6 show the cache link in search results only if there is actually a cache
entry stored in HTCACHE
2013-06-09 08:15:23 +02:00
Michael Peter Christen
f7e77a21bf Added a citation reference computation for intra-domain link structures.
While the values for the reference evaluation are computed, also a
backlink-structure can be discovered and written to the index as well.
The host browser has been extended to show such backlinks to each
presented links. The host browser therefore can now show an information
where an document is linked. The new citation reference is computed as
likelyhood for a random click path with recursive usage of previously
computed likelyhood. This process is repeated until the likelyhood
converges to a specific number. This number is then normalized to a
ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to
rank popularity within intra-domain link structures.
2013-06-07 13:20:57 +02:00
Michael Peter Christen
fdcd4e6a6f fixes to index deletion: quoting of host name (a '-' may be part of the
url) and disabling the engage button when changing the url field at
'Delete by URL matching'
2013-06-07 08:52:07 +02:00
reger
7480e87386 - fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247
- append language setting specific stopword list

- remove unused OVERHANG stack type
2013-06-06 22:07:54 +02:00
orbiter
5c7ddc67fe in GSA api enable usage of solr fq-attribute together with GSA
site-attribute
2013-06-06 13:36:58 +02:00
Michael Peter Christen
eb9d0ba5b1 ranking and boost function update, small bugfixes, better default search
field for solr
2013-05-30 16:30:35 +02:00
Michael Peter Christen
5f92c68f1f removed block rank ranking and all YBR files in /ranking 2013-05-30 13:01:22 +02:00
Michael Peter Christen
164603b946 cleanup 2013-05-30 12:47:22 +02:00
Michael Peter Christen
0c1a018bbd removed 'later' tactic because it used too much RAM, reduced number of
soft commits, reduced caching size of search events, ensured that solr
results are processed before connection is closed to keep that stuff not
too long in RAM
2013-05-29 18:27:27 +02:00
Michael Peter Christen
709e9b8ce7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-05-29 13:49:42 +02:00
Michael Peter Christen
9e07447d47 added new link for SMW 2013-05-29 13:45:22 +02:00
Michael Peter Christen
3c04dd11de removed dead link 2013-05-29 13:42:38 +02:00
Michael Peter Christen
281959a2d7 added option to re-boot the embedded solr during run-time. Added also
API recording for this method so it can be repeated automatically. The
index dump generation is now also available for API recording. Added
some synchronization in backend which was necessary for this.
2013-05-29 13:09:34 +02:00
Michael Peter Christen
80a7989e8c fixed ClassCastException: [Ljava.lang.Object; cannot be cast to
[Ljava.util.List; in robots.txt servlet
2013-05-29 12:02:19 +02:00
orbiter
da621e827e prevent NPE in case RWI is disabled 2013-05-28 16:26:38 +02:00
Michael Peter Christen
7300d81f40 include API Table deletion requests to the API recorder 2013-05-28 11:35:56 +02:00
Michael Peter Christen
d2ade87b49 fixed missing thisaddress in yacysearch.html which caused that the
opensearch link was not working
2013-05-28 10:33:41 +02:00
Michael Peter Christen
179d032181 added a (badly formatted) delete button for process scheduler entries 2013-05-27 16:15:58 +02:00
reger
c03f75ebc3 fix DHT url receive see http://bugs.yacy.net/view.php?id=242 2013-05-26 03:24:32 +02:00
Marc Nause
8fb1b1e290 *) simplified banner creation code 2013-05-25 12:56:43 +02:00
Marc Nause
cd0b5f31b4 *) updated links to description of regex 2013-05-25 11:08:06 +02:00
Michael Peter Christen
8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
reduced time-out of robots.txt load limit
2013-05-20 22:05:28 +02:00
Michael Peter Christen
f93501e6e0 nice crawl name if crawl is started with file:// (was: null) 2013-05-20 11:25:26 +02:00
Michael Peter Christen
b4f0cac102 added the reindexing job servlet to the submenu structure 2013-05-20 11:02:21 +02:00
Michael Peter Christen
8dbc80da70 redesign of index.exist-test: this shall now not be done using a single
id to be tested, but with a collection of ids. This will cause only a
single call to solr instead of many. The result is a much better
performace when testing the existence of many urls. The effect should
cause very much less IO during index transmission, both on sender and
receiver side.
2013-05-17 13:59:37 +02:00
Michael Peter Christen
c91c67c3cd reject bad solr requests 2013-05-15 22:42:05 +02:00
Michael Peter Christen
44e363f37f refactoring of WorkflowProcessor, added process counter, update of
process counter if an blocking thread dies. Added also a new column in
PerformanceConcurrency_p servlet to show the actual number of concurrent
processes.
2013-05-13 13:28:07 +02:00
reger
79401cb938 added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html)
this allows to remove obsolete fields from the index (according to current schema config)
by selecting all documents containig disabled fields.
2013-05-13 04:06:57 +02:00
Michael Peter Christen
b24d1d18e4 removed synchronization and concurrency in Fulltext class, concurrent
deletions are now handled in ConcurrentUpdateSolrConnector
2013-05-11 10:53:12 +02:00
Michael Peter Christen
f965d04496 added new peer icons for Mentor peers and Mentee peers (not used yet) 2013-05-10 17:33:02 +02:00
Michael Peter Christen
b9b446bca6 - added ssl configuration sign (a lock) to network statistic/table
- fixed a bug in bitfield
2013-05-10 17:32:21 +02:00
Michael Peter Christen
7095446ad3 added checkbox (near port) to switch on ssl support (https access) to
the admin interface.
2013-05-10 13:49:46 +02:00
Michael Peter Christen
e6c8b545c2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-05-10 12:16:55 +02:00
orbiter
4baa0d4a97 Added a default keystore for ssl encryption of the YaCy web interface.
This will enable https-access to YaCy, but this feature is disabled by
default using the new server.https=false attribute. This has two
purposes:
- make it easier for everyone to use https (just set server.https=true)
- provide the basis for secure yacy-to-yacy communication in the future
2013-05-10 12:02:31 +02:00
Michael Peter Christen
038f956821 fix for sitemap detection: the sitemap url was not visible if it
appeared after the declaration of robots allow/deny for the crawler
because the sitemap parser terminated after the allow/deny rules had
been found. Now the parser reads the robots.txt until the end to
discover also sitemap rules at the end of the file.
2013-05-10 04:56:58 +02:00
Michael Peter Christen
e26bdd4a52 fixes to deletion methods (removed unnecessary concurrency and added
removal of crawl queue entries)
2013-05-08 13:26:25 +02:00
Michael Peter Christen
f7f3e28c5e prevent that the size of the index is computed too many times.
Because the index size is now provided by solr, and the only way to do
that is a match for [* TO *], a size computation is quite complex and
time-consuming. Therefore this patch prevents that the method is called
at all and if necessary puts a DOS-preventing barrier in front of it.
2013-05-08 11:50:46 +02:00
Michael Peter Christen
cca19d94d4 re-declared some fields to be of type string rather than text which
makes them more efficient and less large
2013-05-06 16:45:54 +02:00
Michael Peter Christen
ed1d5bace6 draw the names of other peers which receive/send dht into the network
graphic
2013-05-06 14:27:39 +02:00
Michael Peter Christen
b528448332 enlarge network graph circle according to image height and reduce the
image height in the Network servlet. Overall, the image is now larger
but takes less space on the web page.
2013-05-05 23:39:46 +02:00
Michael Peter Christen
f1bb54943e typo 2013-05-04 09:34:06 +02:00
Michael Peter Christen
d7fd346917 - added regular-expression based deletions
- on-demand collection-list generation for collection-based deletions
instead of a default collection-list presentation (this makes calling
the interface much faster since the computation of collections lists for
large indexes may take some seconds)
2013-05-04 01:14:10 +02:00
Michael Peter Christen
3841854c97 abstraction of catchall term 2013-05-04 00:14:22 +02:00
sixcooler
e145afb8d6 fix for PerformanceMemory showing UNRESOLVED_PATTERN by removing
solr-cache-stuff, which is not available anymore
2013-05-02 15:47:21 +02:00
Michael Peter Christen
1b102d98d8 - added index deletion to index administration submenu
- added index deletion processes to the process scheduler/recorder
2013-04-30 02:11:28 +02:00
Michael Peter Christen
0e2ee00fea added an index deletion servlet and some style changes for the
'dangerous' engage-button
2013-04-29 19:30:53 +02:00
Michael Peter Christen
e4f7e5bcfe fixed bad css change 2013-04-28 20:09:45 +02:00
Michael Peter Christen
3502b4c697 refactoring (renaming) of yacy-solr api 2013-04-27 01:32:18 +02:00
Michael Peter Christen
3a0fcfbeda Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-04-26 10:50:08 +02:00
Michael Peter Christen
25499eead5 - added a new field for the regular expression in crawl start
- added the field in crawl profile
- adopted logging end error management
- adopted duplicate document detection
- added a new rule to the indexing process to reject non-matching
content
- full redesign of the expert crawl start servlet
The new filter field can now be seen in /CrawlStartExpert_p.html at
Section "Document Filter", subsection item "Filter on Content of
Document"
2013-04-26 10:49:55 +02:00
reger
0a9b0992f3 RinkingSolr_p: include warning if boost field not in local index 2013-04-26 02:26:38 +02:00
orbiter
e1bfe9d07a - reduction of the concurrently running processes to make YaCy more
adjusted to smaller and 1-core devices.
- the workflow processor now starts no process at all. these are started
as soon as parser/condenser/indexing queues are filled.
- better abstraction
2013-04-25 11:33:17 +02:00
Michael Peter Christen
c091000165 added collection attribute also to the rss feed reader 2013-04-24 01:14:35 +02:00
orbiter
f7571386a3 added a 'collection' property attribute in yacysearch.html which can be
used to select between different collections as defined during a crawl
start with the 'collection' attribute. This actually implements the
ability to prepare search tenants which restrict their search results to
a specific collection. The main use for this is to provide tenants to
the yaml4 interface (at this time).
2013-04-23 20:42:54 +02:00
orbiter
3e79bd4b1f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-04-23 12:15:46 +02:00
orbiter
d571e739b6 increased row limitation for authorized users from 10000 to 100000000 in
solr interface
2013-04-23 12:15:33 +02:00
Michael Peter Christen
a1fffe8e86 fixed default ranking values 2013-04-21 12:27:27 +02:00
Michael Peter Christen
1d30082446 added hindi translation configuration 2013-04-17 12:57:27 +02:00
Michael Peter Christen
97775fbebc fixed ranking for add-function queries: this did not work. The option
was removed. All function queries are now boosts (multiplies the score
according to a function). This is also the recommended way to boost
rankings based on functions as explained in
http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/
2013-04-16 14:45:14 +02:00
Michael Peter Christen
298bf2deb5 fix to ranking configuration servlet 2013-04-16 12:38:16 +02:00
Michael Peter Christen
2db058b551 added in RankingSolr_p.html a select box to switch between different
ranking situations. By default, four situations can be configured.
2013-04-16 11:38:51 +02:00
Michael Peter Christen
6fbca35215 fixed api table navigation 2013-04-16 01:39:30 +02:00
Michael Peter Christen
f24ac518e6 redesign of exists()-query (can now be called with query) and the
CachedSolrConnector which based its cache on the key value. This will be
used to correct the title_unique_b and description_unique_b field.
2013-04-15 14:08:30 +02:00
Michael Peter Christen
27d6222880 added new field host_extent_i which, after a crawl and postprocessing,
holds the number of documents for the host where the document is hosted.
This is necessary for ranking and the norming of references per local
host in the ranking computation.
2013-04-14 20:52:40 +02:00
Michael Peter Christen
579eb01a49 showing now the details of references count in host browser:
external (ext), internal (int) and external hosts (hosts) for each
indexed document.
2013-04-14 11:30:57 +02:00
reger
0f4237d8e5 add admin option to delete load errors from index 2013-04-14 05:33:01 +02:00
Marc Nause
e99c8789ff *) fixed encoding of query in link to map (in case geolocalization is
enabled, "Show search results for "köln" on map")
*) applied suggestions of Checkstyle plugin
2013-04-13 21:50:48 +02:00
Michael Peter Christen
082e3274d6 - setting the same default ranking in the solr interface as for YaCy
search interfaces if no other ranking attributes are given
- using the YaCy ranking in the GSA interface only if there was not
given a GSA-style sort attribute
- to avoid confusion about correct ranking attributes, only the default
'0'-ranking profile is used and not scenario-adopted (site, date)
because that should be configurable in the web interface before it is
used actually for ranking.
2013-04-12 10:48:41 +02:00
Michael Peter Christen
edc0b33f6d - showing references count and clickdepth in host browser
- fixed generation and presentation of both values
2013-04-11 14:46:13 +02:00
orbiter
2c3b024196 if the crawl was paused (automatically), show the reason for pausing in
the Crawler_p servlet.
2013-04-09 18:55:26 +02:00
reger
566a3b0294 fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search
- removed duplicate QueryParams.hashes2Handles , redundant  with .hashes2Set
2013-04-08 21:25:21 +02:00
reger
40b3f2c5fe comment out dead menue link 2013-04-06 02:34:56 +02:00
reger
bf1e1ddca1 fix typo in prev commit 2013-04-06 02:29:49 +02:00
reger
d4d93be779 uncomment "used time" calculation for remote search log 2013-04-06 02:08:01 +02:00
reger
36202f27b0 improve remote search log, set "Returned Results" to transmitcount (instead of no value) 2013-04-05 03:33:33 +02:00
reger
254074b11d Merge branch 'master' of git://gitorious.org/yacy/rc1.git 2013-03-22 03:46:26 +01:00
Michael Peter Christen
870aedf3c6 fixes for better search interface integration in yaml templates 2013-03-20 16:19:49 +01:00
Michael Peter Christen
735eb70525 better search timing; prevents '0 results' for very large local
indexes >> 10 mio documents
2013-03-19 11:23:18 +01:00
Michael Peter Christen
342ba1049b - callback fix
- memory allocation problem in RowCollection: if memory is too low, do
not to try to increase by 1 because this leads to very long execution
time and at the end to the same OOM as if we allocate the memory at the
moment we need it even if the resource observer states that this memory
is not there. To compensate this, the increase size is reduced.
2013-03-19 10:32:01 +01:00
reger
31d16f20d7 fix invisible icon not found 2013-03-18 00:10:23 +01:00
orbiter
243b66ae6d Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy 2013-03-17 13:39:31 +01:00
Frank
7763f2554f add the new PPMbar in Crawler_p for a better style and better use. 2013-03-17 11:43:12 +01:00
orbiter
e4d26d1cb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-17 10:52:42 +01:00
orbiter
940c6849ee enhanced did-you-mean (a bit): can now remember previously searched
words (plus small enhancements)
2013-03-17 10:52:31 +01:00
reger
d57b221921 add: reset Solr schema filed selection to default button in IndexSchema_p 2013-03-17 03:46:29 +01:00
Michael Peter Christen
9406a2e438 fixed NPE during index abstract computation 2013-03-15 10:04:27 +01:00
Michael Peter Christen
d725782440 turned severe message to warning message about network failure events 2013-03-15 09:40:02 +01:00
Michael Peter Christen
2d36a7eaf5 - do not create a new query for all remote peers
- no document search this time
- adjusted banner and network to not show 'WORDS' but DHT Chunks. This
is to avoid confusion for robinson peers which do not create Word
Entries
2013-03-15 00:14:28 +01:00
Michael Peter Christen
2080fc7406 removed unused tag fields 2013-03-14 10:35:21 +01:00
reger
7804c12976 fix error msg in ConfigHeuristics_p 2013-03-14 03:30:25 +01:00
reger
230a12bfe2 adjust Opensearch discover function to new webgraph Solr schema 2013-03-14 03:10:54 +01:00
orbiter
47114910d5 fix for possible memory leaks 2013-03-13 17:55:37 +01:00
Michael Peter Christen
addba047e2 changes in ranking computation
- an existing ranking servlet for solr was extended. It is now possible
to set boost values for fields, boost functions and boost queries.
- The ranking can have different instances, but currently only the first
one is used
- added an abstraction layer for fields which can be used for search and
those fields can be edited in the solr ranking configruation
- the ranking value from solr within the field score is used to combine
remote search requests, which all are created using the same locally
defined boost values
- reduced the number of fields which are used for search (makes it
faster)
- replaced some text fields by string fields (makes indexing faster)
- removed classes which had no use
- made a large number of experiments for a better ranking and created a
temporary setting which prefers hits inside titles
- adjusted also the RWI-based ranking computation to 'prefer title'
- made special cases like for portal search where no post-processing and
post-ranking is wanted: this keeps the original ranking order as done by
Solr
- fixed many bugs with old settings for ranking
2013-03-13 14:47:00 +01:00
Michael Peter Christen
68e739a90b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-10 02:29:38 +01:00
Michael Peter Christen
3d9ce9cd04 - added more selection criteria for network seed list
- enhanced up script
2013-03-10 02:26:24 +01:00
orbiter
168e8d9b4d added/fixed missing DOCTYPE line (submitted by Thomas) 2013-03-08 14:40:09 +01:00
Michael Peter Christen
25300913fa fixes to search debugging after testing with the different search
debugging options
2013-03-05 21:28:22 +01:00
Michael Peter Christen
2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed
search cache to prevent that too much IO load is on the peer during
search.
2013-03-04 00:07:52 +01:00
Michael Peter Christen
221ed7d764 - enhanced concurrency during search without IO blocking
- introduced a second queue to flush remote search results (now: old
metadata structure from DHT peers)
- fixed result counters
2013-03-03 22:38:50 +01:00
Marc Nause
2714b59f38 *) For some reason this seems to fix a ClassCastException on my system
(OpenJDK).
2013-03-03 20:38:20 +01:00
orbiter
0f7ea7ad9f - enhanced solr.add procedure for mass adds
- removed unused solr access classes
- made snippet generation for documents aus YaCy RWI/DHT concurrent (as
it was before the search process removation)
- reduced the number of remote results in settings file because the
processing of such mass documents add is too CPU-intensive (in Solr)
2013-03-01 15:27:17 +01:00
orbiter
7ff10bdb1b fix of page navigation for formatted totalcount numbers 2013-03-01 00:48:28 +01:00
orbiter
a734fbc4a5 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-27 22:44:57 +01:00
orbiter
d74472f562 corrected result counter 2013-02-27 22:40:23 +01:00
orbiter
aa3c26c62e added recrawl/reload to CrawlStartSite for a timeout of 3 days 2013-02-27 11:43:36 +01:00
orbiter
c1b7e61882 added option to create empty vocabularies 2013-02-27 08:24:37 +01:00
bubu
e0edad689d fix link to IndexSchema_p.html 2013-02-26 21:12:44 +01:00