Commit Graph

9296 Commits

Author SHA1 Message Date
Michael Peter Christen
1768c82010 removed field selection because that created documents with that field
only which was not useful when re-writing the same document
2013-01-24 03:26:38 +01:00
Michael Peter Christen
8eebeea533 fix for search result link in ViewFile 2013-01-24 01:50:59 +01:00
Dmitriy Kazimirov
5bed1a7893 Russian localization update 2013-01-23 14:43:00 +01:00
Michael Peter Christen
31e854bef6 Merge remote-tracking branch 'copro/master' 2013-01-23 14:41:17 +01:00
Michael Peter Christen
4735bd47f4 - changed solr commit call and added an optimize option. Since Solr
4.0.0 there is a new softcommit feature which implements a
near-real-time (NRT) search option. The softcommit does not do IO and
does not cause performance issues.
YaCy has now an extension in its solr connectors to use the softcommit
feature. The softcommit call now replaces all places where a hard commit
was used. Furthermore the commit strategy in when doing a search from
the web interface was changed (it's done every time before a search is
done).

The softcommit feature was implemented because it was needed for the
following changes (customer demands), which is also included in this
git commit:

- added a feature to identify all documents which have unique titles
and/or unique descriptions. These unique flags are disabled by default.
- added also a feature to set a flag when the url from a canonical tag
is equal to the document url. This is also disabled by default.

To support the new softcommit strategy, the commitWithinMs option was
set to -1 do disable automatic commit based on document insert times. If
documents are inserted permanently then also a commit would happen
permanently whenever the commitWithinMs time is reached. This would
conflict with the regular autocommit of 10 minutes and the new
softcommit strategy.
2013-01-23 14:40:58 +01:00
Copro
0025983993 Fix typo embedd -> embed 2013-01-23 04:11:55 +01:00
Copro
3ea8380959 Adding Vimeo tag to wiki commands to embedd Video video with id 2013-01-23 04:00:15 +01:00
Copro
ee9d7fd93d Added feature to embedd Youtube videos to wiki commands for usage in
Wiki, Blog or other servlets
2013-01-23 02:43:58 +01:00
Michael Peter Christen
ec927ea72b Merge remote-tracking branch 'reger/master' 2013-01-22 17:01:49 +01:00
Michael Peter Christen
7159ed2a7d Merge remote-tracking branch 'copro/master' 2013-01-22 17:01:18 +01:00
Copro
946fad48c7 Some more German translation reducing the amount of Unused String
messages
2013-01-22 15:33:49 +01:00
Aleksej
6690dac845 Russian translation fixes not merged due to conflict 2013-01-22 16:19:07 +04:00
Michael Peter Christen
9ccdd21d76 Merge remote-tracking branch 'aleksejs/fixtrans'
Conflicts:
	locales/ru.lng
	
Tried to merge this but I had to made this 'blind'.
Sorry if I deleted something that was right.
2013-01-22 11:54:38 +01:00
Copro
de7c3d95b4 Added German translation for HostBrowser.html 2013-01-22 05:14:37 +01:00
Dmitriy Kazimirov
5e5ae01909 updated Russian localization for update system 2013-01-21 18:07:18 +01:00
Dmitriy Kazimirov
f9c65078f0 A little more fixes for Russian localization 2013-01-21 18:07:08 +01:00
Dmitriy Kazimirov
ca01d225db A little more fixes for Russian localization 2013-01-21 18:07:00 +01:00
Dmitriy Kazimirov
9dc0bea1dc Little more correct and readable Russian localization 2013-01-21 18:06:51 +01:00
Dmitriy Kazimirov
c1b9113a68 Little more correct and readable Russian localization 2013-01-21 18:06:43 +01:00
Dmitriy Kazimirov
9cc72df176 More Russian translations. And if some text is not translated it will be in English and not German 2013-01-21 18:06:02 +01:00
Michael Peter Christen
db024a4e19 added new solr fields (unused yet; implementation will follow) 2013-01-21 18:02:29 +01:00
Michael Peter Christen
f5fd2aea18 removed archaic migration code 2013-01-21 17:59:42 +01:00
Michael Peter Christen
9b5bdae1b4 Reverted setting of MMapDirectoryFactory from solrconfig; see
http://forum.yacy-websuche.de/viewtopic.php?p=27509#p27509
Instead, in the start script is checked if the host is a 64 host and
-Dsolr.directoryFactory=solr.MMapDirectoryFactory is set as java option

Reverted the ramBufferSizeMB setting (this was not enabled anyway)
because that may be too much memory for small peers and embedded
systems.

Activated the mergeFactor 4; this was commented out by mistake
2013-01-21 17:55:28 +01:00
reger
f8f7f33596 add Maven build script 2013-01-20 21:08:59 +01:00
orbiter
eb68a30947 solr performance settings
the target of these performance settings is the reduction of IO in
general and during search in particual.
- reduced mergeFactor to 4. This will increase the IO during indexing,
but will reduce IO during search. It will also greatly reduce the number
of open files which should make it possible to have overall larger
indexes until the number of open files in an OS is reached.
- increased ramBufferSizeMB to 256mb. This will reduce the number of
commits. This change may compensate the reduction of the mergeFactor.
- disabled updateLog. This is a real-time search feature which is
available in YaCy anyway because a commit is forced if index.html is
called. The updateLog feature causes a lot of IO during indexing and
search and produced a lot of files in SEGMENTS/solr_40/data/tlog
2013-01-19 11:21:33 +01:00
Michael Peter Christen
60f2a69331 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-01-17 21:53:19 +01:00
Michael Peter Christen
cba038f97b one more NPE fix 2013-01-17 21:52:56 +01:00
sixcooler
f3e705c4fe bump to httpclient / httpcore 4.2.3 (bugfix-release) 2013-01-17 20:10:49 +01:00
Michael Peter Christen
aa067da86b set the 'all' option as option at end of the list because the all option
currently select also lists which cannot be exported in xml correctly
2013-01-17 01:04:50 +01:00
Michael Peter Christen
af465cdca5 fix for wrong robots.txt loading for https protocol
see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4579
2013-01-16 17:38:06 +01:00
Michael Peter Christen
edbc86d2b0 integrated search term into opensearch result title. this makes better
bookmark names when subscribing multiple search results from the same
peer
2013-01-16 16:18:03 +01:00
Michael Peter Christen
c3d50d91f8 relaxing site operator for www prefix:
- when using a site operator search for a domain where the domain has a
www prefix, also the domain without the www is enclosed
- when using a site operator search for a domain where the domain has no
www prefix, also the domain with the www in enclosed
- in the host navigator, all domains with and without a www prefix are
accumulated. That means that the host navigator does never show a host
with a www prefix.
This should prevent usage mistakes of the site operator.
2013-01-16 14:54:35 +01:00
Michael Peter Christen
f53703df62 using MMapDirectoryFactory as solution for ClosedChannelException given
in https://issues.apache.org/jira/browse/SOLR-2247
2013-01-16 14:35:37 +01:00
Michael Peter Christen
db49e91724 fixed a NPE which may appear for freeworld peers without any rwi index
data. This the NPE looked like:
Caused by: java.lang.NullPointerException
	at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:279)
	at
net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:155)
	at search.respond(search.java:314)
	... 12 more
2013-01-16 11:07:20 +01:00
Michael Peter Christen
4faa07c214 added a timeout for topic computation (solr is here much slower than the
old metadata-db)
2013-01-15 16:20:43 +01:00
Michael Peter Christen
d2d5be032d added a 'inlink' search option according to the suggestion in the YaCy
forum at 
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=4572#p27410

The feature was not called 'haslink' but called 'inlink' to have a
analogous naming like 'inurl'. This causes now that you can search for
words in links of the document, like:
* inlink:yacy
searches all documents which link to pages which have an 'yacy' in the
url.
2013-01-14 12:50:21 +01:00
Michael Peter Christen
76e1e91b11 with strict compiler settings, IndexFederated_p does not compile without
@SuppressWarnings("deprecation")
2013-01-14 12:33:01 +01:00
reger
3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
- migrates all entries in old urldb

Metadata coordinate (lat / lon) NumberFormatException still relative often (see excerpt below), 
- added try/catch for URIMetadataRow (seems not to be needed in URIMetaDataNode, as Solr internally checks for number format)
- removed possible typ conversion for lat() / lon() comparison with 0.0f, changed to 0.0  (leaving it to the compiler/optimizer to choose number format)

current log excerpt for NumberFormatException:
W 2013/01/14 00:10:07 StackTrace For input string: "-"
java.lang.NumberFormatException: For input string: "-"
	at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
	at java.lang.Double.parseDouble(Unknown Source)
	at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525)
	at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279)
	at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277)
	at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329)
	at transferURL.respond(transferURL.java:152)
...
Caused by: java.lang.NumberFormatException: For input string: "-"
	at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
	at java.lang.Double.parseDouble(Unknown Source)
	at net.yacy.kelondro.data.meta.URIMetadataRow$Components.lon(URIMetadataRow.java:525)
	at net.yacy.kelondro.data.meta.URIMetadataRow.lon(URIMetadataRow.java:279)
	at net.yacy.search.index.SolrConfiguration.metadata2solr(SolrConfiguration.java:277)
	at net.yacy.search.index.Fulltext.putMetadata(Fulltext.java:329)
	at transferURL.respond(transferURL.java:152)
2013-01-14 03:06:24 +01:00
reger
3b6e08b49f prevent checking of urldb if empty
- disconnect urlIndexFile if empty
- add missing lock class in submenuSearchConfiguration
2013-01-12 15:20:23 +01:00
reger
1fb452174a read defaults from yacy.init for "Set to Defaults" button 2013-01-05 20:47:18 +01:00
reger
f143804382 fix configuration for search page navigators
- added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page)
   - currently redundant setting with part of ConfigPortal page
- added missing config for filetype and protocol navigator
- adjusted init of SearchEvent to check navigation config setting
- renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)
2013-01-05 19:00:54 +01:00
Michael Peter Christen
24db2fcd9d fix for Network info 2013-01-05 11:52:35 +01:00
Michael Peter Christen
22c694f906 activated the clickdepth_i attribute for solr again because the
calculcation of that value is not as extensive as expected and
furthermore the value is very useful for ranking
2013-01-05 01:00:18 +01:00
Michael Peter Christen
becd52a984 added also a re-calculation of reference counts during the
post-processing of clickcount calculations. This is a really nice thing
to have because the reference count affects ranking.
2013-01-05 00:58:27 +01:00
Michael Peter Christen
fc47109608 added 'Last Hour' to network statistics 2013-01-05 00:37:52 +01:00
Michael Peter Christen
38d3feae65 added separate delete commands for the local+remote solr index, the old
metadata and old rwi and for the citation index. The important
advancement is the separation of the citation index deletion because
that index is responsible for the linkdepth calculation. Now a search
index can be deleted without the citation index and that should cause
that less clickdepths must be post-processed.
2013-01-04 16:39:34 +01:00
Michael Peter Christen
6f0baaa309 added the clickdepth post-processing: some links may have 'shortcuts' to
already calculated click depths. There are then calculated if the crawl
buffer is empty and therefore no new 'shortcuts' can be discovered.
The status of the clickdepth stack (to-be-processed) can be seen using a
solr search command like this:
http://localhost:8090/solr/select?q=process_sxt:[*%20TO%20*]&start=0&rows=30&fl=sku,clickdepth_i,process_sxt
2013-01-04 16:37:39 +01:00
Michael Peter Christen
0f5b6f38c1 enhanced root-url detection 2013-01-03 19:21:21 +01:00
Michael Peter Christen
5a0eb1b268 clickpath should not be active by default because it needs extensive
computation - partly to be implemented
2013-01-03 01:30:05 +01:00
Michael Peter Christen
8ae08a2cac moved HTCache, Heuristics and Parser servlet to a more appropriate menu
location
2013-01-03 01:27:16 +01:00