Michael Peter Christen
f9c0e6e950
- Implemented and integrated the URIMetadataNode object which is a
...
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
2012-08-10 13:26:51 +02:00
Michael Peter Christen
bca4a16603
replaced the multivalue generic string field name suffix _ss by _txt
...
because _ss is not part of the standard solr example schema.
2012-08-06 17:58:09 +02:00
orbiter
67edfd991c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-08-05 15:49:48 +02:00
orbiter
d9173ba7ed
added more solr fields to integrate values from URIMetadataRow. All
...
writings to the Metadata-DB are now also done to solr. This includes
metadata transfer during search and rwi transfer.
The new/added solr fields are:
## time when resource was loaded
load_date_dt
## date until resource shall be considered as fresh
fresh_date_dt
## id of the host, a 6-byte hash that is part of the document id
host_id_s
## ids of referrer to this document
referrer_id_ss
## the md5 of the raw source
md5_s
## the name of the publisher of the document
publisher_t
## the language used in the document; starts with primary language
language_ss
## an external ranking value
ranking_i
## the size of the raw source
size_i
## number of links to audio resources
audiolinkscount_i
## number of links to video resources
videolinkscount_i
## number of links to application resources
applinkscount_i
2012-08-05 15:49:27 +02:00
Michael Peter Christen
3ce04cecf3
bad hack to prevent a bug appearing in solr
2012-07-31 23:49:07 +02:00
Michael Peter Christen
826967513b
changed options in IndexFederated_p to switch on/off parts of the index
...
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
Michael Peter Christen
1517a3b7b9
added webm mime-type
2012-07-08 17:59:20 +02:00
Michael Peter Christen
0301aba1e9
removed unused method parameters
2012-07-05 10:23:07 +02:00
Michael Peter Christen
4de50fe808
adding more principal peers for bootstraping
2012-07-05 00:43:41 +02:00
reger
067728bccc
add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages)
2012-07-01 00:12:20 +02:00
Michael Peter Christen
508a81b86c
added solr field 'refresh_s' which stores the refresh url contained in
...
the meta-refresh html header field.
2012-06-28 13:27:45 +02:00
Michael Peter Christen
9116013c64
- allow lazy initialization of solr value (if using 'lazy', then no
...
0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
2012-06-27 12:17:58 +02:00
Michael Peter Christen
c03d306afa
shorter autocommit time (now: 1 second) to prevent that user cannot see
...
results in solr the first time they try it out. The value can now be
easily set to a higher number using the IndexFederated_p interface.
2012-06-26 14:53:45 +02:00
Michael Peter Christen
3fd4a01286
added option to record urls that are forwarded to the solr index
2012-06-26 13:54:48 +02:00
Michael Peter Christen
8dd469b9dd
added option to configure the autocommit delay time of solr on-the-fly
2012-06-25 14:59:46 +02:00
Michael Peter Christen
b9dfca4b0a
- fixed IndexFederated Servlet / a embedded Solr can now be selected
...
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
2012-06-25 11:34:38 +02:00
Michael Peter Christen
1be0025a9c
- added test for EmbeddedSolrConnector
...
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
2012-06-22 00:36:49 +02:00
Michael Peter Christen
dbdd697f4d
moved RDFaParser.xsl configuration file to defaults
2012-06-21 16:09:12 +02:00
Michael Peter Christen
8738336408
set Xms lower than Xmx
2012-06-19 08:45:49 +02:00
Michael Peter Christen
96f6a5869f
more robust OAI-PMH client (large time-out, three re-tries). OAI-PMH
...
server appeart to be very slow sometimes
2012-06-16 22:30:31 +02:00
Michael Peter Christen
6d17686258
made triplestore persistent by default
...
added a size display in triplestore servlet
2012-06-15 19:13:07 +02:00
cominch
3c255c025b
Show tags in search results (if activated in ConfigPortal_p.html)
2012-06-15 10:43:05 +02:00
Michael Peter Christen
a5cdfb91de
- fixed Cache link (below snippet)
...
- added 'Augmented Proxy' link below snippet
- added configuration options for augmented proxy
2012-06-14 19:55:34 +02:00
Roland 'Quix0r' Haeder
af5a597e47
Scroogle is not comming back, remove dead code
...
Conflicts:
source/net/yacy/search/Switchboard.java
2012-06-10 23:38:41 +02:00
cominch
90512640bf
Added config switches for custom parser
...
Conflicts:
source/net/yacy/document/TextParser.java
2012-06-10 12:49:36 +02:00
cominch
5d20cd324a
Add Triplestore and RDF query interface
...
Conflicts:
build.xml
defaults/yacy.init
source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:35:59 +02:00
cominch
a32943b382
add json mimetype
2012-06-10 09:29:09 +02:00
Michael Peter Christen
41c02cb10e
- less restrictions for usage of Table RAM copy
...
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
2012-06-08 12:48:25 +02:00
Michael Peter Christen
8002fd2578
use less cache space since a large cache would cause more memory usage
...
in index files.
2012-06-06 14:17:42 +02:00
Michael Peter Christen
5aee19daa4
added show from cache in search results (not yet finished)
2012-06-04 23:44:26 +02:00
Michael Peter Christen
0d32a766ed
relax verify attribute for search widget to make it faster:
...
set to "cacheonly"
2012-05-20 00:50:54 +02:00
Michael Peter Christen
7eece0256f
moved yacy.logging to defaults according to request in
...
http://bugs.yacy.net/view.php?id=55
2012-05-17 04:26:03 +02:00
Michael Peter Christen
db9d81cb7a
ups
2012-05-16 01:04:08 +02:00
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
2012-05-15 12:25:46 +02:00
Michael Peter Christen
2be327b5ab
update location update
2012-04-19 11:49:43 +02:00
Michael Peter Christen
99c74699de
removed scroogle (scroogle is dead)
2012-02-25 12:57:59 +01:00
Michael Peter Christen
8bee1472c9
there is no noindex, only nofollow in links
2012-01-31 23:46:35 +01:00
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
2012-01-26 15:32:30 +01:00
Michael Peter Christen
696ee5fc16
removed pdf from default parser deny list
2012-01-23 17:27:58 +01:00
Lotus
c73af39e54
refactoring of tray icon class,
...
now uses Java 6 methods natively
2012-01-18 20:47:09 +01:00
Michael Peter Christen
987b412491
updated solr scheme: generic declaration of solr schemes
2012-01-13 11:25:15 +01:00
Michael Peter Christen
0bcef2d156
added feature as requested in
...
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
Michael Christen
17f962fceb
translator updates:
...
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
2011-12-08 10:25:26 +01:00
Michael Christen
c715d19c09
fixes for dependency on svn
2011-12-06 22:05:22 +01:00
Michael Christen
f62e6fb438
less frequent DHT distribution to reduce the load a bit on every peer
2011-12-05 15:45:33 +01:00
Michael Christen
9dbc93613e
now that the whole world knows that we actually do p2p and not
...
metasearch we can support a default look-up to scroogle to gain more
attention to people who say that your search results are incomplete
2011-12-05 11:52:24 +01:00
orbiter
f9216e388c
- faster ping to clean up old peers faster
...
- clean up more news
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-30 21:21:16 +00:00
orbiter
ac5bda205f
- removed lower page navigation (it never looks nice)
...
- added visibility of metadata and parser in search results since that shows what YaCy can do in a nice way
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 13:30:42 +00:00
orbiter
c659310e89
- removed option to search for audio, video and applications. These things are still experimental and should not be shown to new users since this would cause them to argue that YaCy does not work. The functions are stil available, because:
...
- added a configuration option in ConfigPortal to swtich the search media types on or off
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 13:07:03 +00:00
orbiter
6cd27473f5
- better default values for caching and cache usage
...
- set new caching and verification behavior according to use case automatically
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 10:22:02 +00:00