Michael Peter Christen
9116013c64
- allow lazy initialization of solr value (if using 'lazy', then no
...
0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
2012-06-27 12:17:58 +02:00
Michael Peter Christen
0294a53459
- add canonical field only if requested by solr schema
...
- remove canonical url from in/outbound urls if present
2012-06-26 14:51:57 +02:00
Michael Peter Christen
3fd4a01286
added option to record urls that are forwarded to the solr index
2012-06-26 13:54:48 +02:00
Michael Peter Christen
96aeb127e3
generalized localhost naming.
...
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c
fixing redirects and status codes: storing of status code in
...
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
8dd469b9dd
added option to configure the autocommit delay time of solr on-the-fly
2012-06-25 14:59:46 +02:00
Michael Peter Christen
b9dfca4b0a
- fixed IndexFederated Servlet / a embedded Solr can now be selected
...
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
2012-06-25 11:34:38 +02:00
Michael Peter Christen
fad3b14813
added jetty libraries, needed for future use as web server and as
...
application server for the solr search interface
2012-06-22 15:31:17 +02:00
Michael Peter Christen
a38b0a2c46
extended embedded solr tests to ensure that it will be usable within a
...
jetty instance
2012-06-22 11:40:02 +02:00
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
2012-06-22 11:39:17 +02:00
Michael Peter Christen
a5eb91fa60
refactoring
2012-06-22 00:49:32 +02:00
Michael Peter Christen
1be0025a9c
- added test for EmbeddedSolrConnector
...
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
2012-06-22 00:36:49 +02:00
Michael Peter Christen
e12bb254b4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-06-21 14:55:50 +02:00
Michael Peter Christen
3f55dc7c1e
- added solr core and libraries that solr needs (lucene is missing, will
...
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00
Michael Peter Christen
786be7d175
better integration of RDFaParser
2012-06-20 16:39:04 +02:00
Michael Peter Christen
0752983fbd
- automatic periodic saving of triplestore
...
- transaction-safe storage of triplestore
2012-06-17 10:50:12 +02:00
Michael Peter Christen
9264d8b4af
removed old navigation practice using subject tags in favor of
...
triplestore-tags
2012-06-17 00:33:40 +02:00
Michael Peter Christen
64c0268b2b
show triplestore metadata in yacydoc and viewfile
2012-06-16 17:40:15 +02:00
cominch
a95127c9af
Triplestore: initalize per-user triplestores
2012-06-14 11:46:53 +02:00
Michael Peter Christen
e89747bb67
- added automated generation of vocabularies from url stubs
...
- added clear of all terms for vocabularies
- added deletion of vocabularies
2012-06-13 15:53:18 +02:00
Michael Peter Christen
8b53771db2
changed behavior of navigation processing:
...
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
2012-06-11 23:49:30 +02:00
Michael Peter Christen
5fc6524ca8
- moved triple store to net.yacy.cora.lod (should be generalized there
...
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
2012-06-11 16:48:53 +02:00
Michael Peter Christen
4ee6fb1de9
added missing blacklist dht cache storage (maybe due to mistakes in
...
cherry picking)
2012-06-11 00:38:02 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1
Rewrote all String blacklist types to enum 'BlacklistType', closes bug
...
#143
Conflicts:
htroot/Supporter.java
htroot/yacy/crawlReceipt.java
htroot/yacy/transferRWI.java
htroot/yacy/transferURL.java
source/de/anomic/crawler/CrawlStacker.java
source/de/anomic/data/ListManager.java
source/net/yacy/peers/Protocol.java
source/net/yacy/repository/Blacklist.java
source/net/yacy/repository/LoaderDispatcher.java
source/net/yacy/search/Switchboard.java
source/net/yacy/search/index/MetadataRepository.java
source/net/yacy/search/index/Segment.java
source/net/yacy/search/query/RWIProcess.java
source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Roland 'Quix0r' Haeder
af5a597e47
Scroogle is not comming back, remove dead code
...
Conflicts:
source/net/yacy/search/Switchboard.java
2012-06-10 23:38:41 +02:00
cominch
65c5826d93
bugfix
...
Conflicts:
source/net/yacy/document/parser/augment/AugmentParser.java
2012-06-10 13:11:54 +02:00
Michael Peter Christen
cde20911bb
saved a bit more ram using UTF8 String compression for OpenGeoDB and
...
Geonames data files.
2012-06-09 10:07:11 +02:00
Michael Peter Christen
2280a7b276
- changed initialization order to prefer allocation of memory for table
...
files first
- bugfixes in memory amount calculation
2012-06-09 09:05:47 +02:00
Michael Peter Christen
0746308bc2
only the metadata tables shall be able to use the tail cache
2012-06-08 18:36:11 +02:00
Michael Peter Christen
41c02cb10e
- less restrictions for usage of Table RAM copy
...
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
2012-06-08 12:48:25 +02:00
Michael Peter Christen
dd14b19c26
lazy initialization of block rank table ... only normal web search uses
...
this. When interactive search or location search is used, the block rank
is switched off
2012-06-08 09:41:29 +02:00
Michael Peter Christen
701b9a28a0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
htroot/PerformanceMemory_p.java
2012-06-08 09:16:16 +02:00
Michael Peter Christen
ab7107b34b
fixed RWIProcess queue limits: now discovering hidden results for mass
...
result retrieval
2012-06-08 09:14:54 +02:00
Michael Peter Christen
b0095c8d3c
flush the compressor cache when a cleanup is done
2012-06-07 19:42:33 +02:00
Michael Peter Christen
a61f44f9e4
lazy initialization of block rank table.
...
this causes that the table is not initialized when there is no search is
done. the effect is most strong if YaCy is started headless which causes
no browser pop-up which otherwise would load the search page and
therefore trigger the initialization of the table.
2012-06-07 13:16:38 +02:00
Michael Peter Christen
96e9d77270
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
2012-06-06 20:13:28 +02:00
Michael Peter Christen
00f2df1120
a variety of possible memory leak fixes
2012-06-06 18:23:18 +02:00
Michael Peter Christen
d0ec8018f5
fixes for bad long computation
2012-06-06 14:13:31 +02:00
Michael Peter Christen
461a0ce052
removed warnings
2012-06-05 20:03:43 +02:00
Michael Peter Christen
407fdf6968
more bug fixes and performance hacks for search process
2012-06-05 15:04:23 +02:00
Michael Peter Christen
a1fe65b115
performance hacks
2012-06-05 12:06:26 +02:00
Michael Peter Christen
2fe207f813
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-06-04 23:44:38 +02:00
Michael Peter Christen
5e562dcdb7
adopted vocabulary usage within anotation/naviagtion feature of search
...
to new SimpleVocabulary class
2012-06-04 23:43:30 +02:00
Michael Peter Christen
240045cf7c
fix for bad distance computation
2012-06-04 16:33:16 +02:00
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
2012-05-31 22:39:53 +02:00
Michael Peter Christen
834dc6b263
store more data from interface access
2012-05-31 00:47:07 +02:00
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
2012-05-30 16:59:13 +02:00
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
2012-05-30 16:26:05 +02:00
Michael Peter Christen
c846e9ca14
redesign of the crawler monitor page: show crawled pages instead of
...
queue of urls that shall be crawled
2012-05-25 01:45:38 +02:00