Commit Graph

4191 Commits

Author SHA1 Message Date
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
563d584420 removed more dependencies in cora from kelondro 2012-09-21 11:02:36 +02:00
orbiter
63762d8f89 removed kelondro dependencies from cora 2012-09-20 19:38:22 +02:00
orbiter
089a03114e full memory usage for debian and when changing the size: debian seems to
dislike the big difference between xmx and xms (I have crashes here
which stop if both values are same)
2012-09-18 22:31:01 +02:00
orbiter
60b1e23f05 added new crawl options:
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
2012-09-16 21:27:55 +02:00
Michael Peter Christen
6ec02deec6 added new crawl attributes in crawl profile (not active yet) 2012-09-14 16:49:29 +02:00
Michael Peter Christen
a13e5153ac - added the possibility to have not one but a list of crawl start urls
- the list of urls is entered in the expert crawl start in a textfield;
the one-line input field was replaced with a text box
- start urls can also be given in one single line where the urls are
separated by a '|'-character
- as an effect, the crawl profile cannot carry a single start url for
identificaton because it is possible to have more. Therefore the url was
removed from the crawl profile
- this affect all servlets which display a crawl profile: removed the
url field from all there servlets
- to work consistently with several start urls and the other crawl
starts which computed crawl start url lists from sitelists or sitemaps,
the crawl start servlet was restructured completely
- new rules for must-match patterns were created to make it possible
that site crawl starts also work with several crawl starts at once
2012-09-14 12:25:46 +02:00
Michael Peter Christen
975bc95ddf added default facet fields for json response format (stub) 2012-09-14 12:09:20 +02:00
Michael Peter Christen
2f218df55d added missing license headers 2012-09-14 12:06:06 +02:00
Michael Peter Christen
a30653a864 added a regular expression test servlet which is linked within the
parser/crawler error page whenever a problem with regular expression
occurs.
This makes it easy to correct and enhance the must-match and
must-not-match patterns just by trying out which pattern could be
correct.
2012-09-14 12:04:54 +02:00
Michael Peter Christen
0504b01bdc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-14 00:48:17 +02:00
orbiter
a55e77a115 added twitter search heuristic 2012-09-13 23:53:53 +02:00
Michael Peter Christen
e54ac38095 - some corrections in usage of getFile() and getFileName()
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
9644c186a4 added search functionality to ViewFile.html servlet 2012-09-11 02:03:14 +02:00
Michael Peter Christen
b69ed96f0b - added collections to yacydoc
- changed yacydoc.htm to yacydoc.json
- added query logging in solr and gsa search result
2012-09-10 15:20:55 +02:00
Michael Peter Christen
5df553c152 - added a json writer for solr (yes there was one using xslt but this
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
2012-09-10 14:30:44 +02:00
Michael Peter Christen
4d29f59a27 removed warnings 2012-09-10 07:15:52 +02:00
Michael Peter Christen
8c099d2106 Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/api/ymarks/import_ymark.java
	source/de/anomic/data/ymark/YMarkEntry.java
	source/de/anomic/data/ymark/YMarkTables.java
2012-09-10 07:05:20 +02:00
apfelmaennchen
59bd478ed1 Added more sophisticated RDF output for YMarks, including the folder
structure (b:Topic) and support for multiple tags (dc:subject) and
folders (b:hasTopic) via rdf:Bag container.
2012-09-09 22:56:24 +02:00
apfelmaennchen
d31a632951 - added dmoz RDF dump importer
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
2012-09-09 09:53:58 +02:00
orbiter
66ac4076c2 added disjunction '|' option to site parameter in GSA API 2012-09-06 22:35:55 +02:00
sixcooler
9ee2e09983 statistics for solr-cache 2012-09-06 22:02:29 +02:00
Michael Peter Christen
d8425e6809 added collections to crawl monitor 2012-09-04 14:47:53 +02:00
Michael Peter Christen
4b36a2c3b4 small style changes 2012-09-04 11:23:41 +02:00
Michael Peter Christen
8ca842b137 added new button design to more buttons 2012-09-03 16:04:57 +02:00
Michael Peter Christen
b2b516cc3e added a collection attribute to crawls and searches:
- a solr field collection_sxt can be used to store a set of crawl tags
- when this field is activated, a crawl tag can be assigned when crawls
are started
- the content of the collection field can be comma-separated, all of
them are assigned to the documents when they are indexed as result of
such a crawl start
- a search result can be drilled down to a specific collection; this is
currently only available in the solr interface and also in the gsa
interface using the 'site' option
- this adds a mandatory field for gsa queries (the google api demands
that field all the time)
2012-09-03 15:26:08 +02:00
Michael Peter Christen
174530a9e0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-03 00:46:17 +02:00
apfelmaennchen
43f3a932fd removed jquery.slider as it is already included as part of jquery-ui
package
2012-09-01 14:17:20 +02:00
apfelmaennchen
a01eb1b7fe removed unused jquery plugin slider as it is part of jquery-ui package 2012-09-01 10:25:22 +02:00
Michael Peter Christen
f75b3f8a47 added more patches to work without RWI data structure 2012-08-31 14:35:56 +02:00
Michael Peter Christen
a427a68bac removed many warnings 2012-08-31 14:07:33 +02:00
Michael Peter Christen
c72c435517 - moved the gsa search interface from /gsa/searchresult? to /gsa/search?
- fixed the NB field data
2012-08-31 14:00:53 +02:00
Michael Peter Christen
31d4d38804 - extended the solr interface by a references-by-word-count method
- reduced danger that a non-existing RWI database causes NPEs
- added Solr queries to did-you-mean: this makes it possible that our
did-you-mean algorithm works together with only Solr and without RWIs
2012-08-31 13:03:00 +02:00
Michael Peter Christen
528d6763fa - added new solr fields:
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
2012-08-31 10:30:43 +02:00
Michael Peter Christen
3142e675e8 fixed problems with GSA api:
- better FS attribute
- highlightning of searched words in title
2012-08-29 16:48:53 +02:00
Michael Peter Christen
3b19fe7b52 - fixed num parameter in GSA api
- changed FS attribute in GSA api
2012-08-29 16:28:32 +02:00
Michael Peter Christen
75d5e3475d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-29 10:13:51 +02:00
cominch
dc468dad01 add content control features for custom filter lists 2012-08-29 09:04:28 +02:00
Michael Peter Christen
316b5fe116 - added a solr type definition verifier
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes
2012-08-28 16:58:06 +02:00
reger
2d2be546fe fix path to env/grafics to display api icon on meta data page 2012-08-26 04:36:52 +02:00
orbiter
7ac259477f added a direct access to solr search api to enhance the visibility if
the embedded solr
2012-08-24 23:04:19 +02:00
orbiter
67f2866cd0 small fixes 2012-08-24 21:44:22 +02:00
orbiter
479bfca571 refctoring 2012-08-23 09:30:11 +02:00
Michael Peter Christen
48a82bc705 log queries anonymous from gsa+solr requests 2012-08-22 23:50:40 +02:00
Michael Peter Christen
ab6ec4ec52 added snippet computation to solr/rss and gsa result writer 2012-08-22 17:37:34 +02:00
Michael Peter Christen
4716546ef5 - reduced memory usage in index transmission using a transformation of
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
2012-08-22 16:30:33 +02:00
Michael Peter Christen
0ad52ac4c3 gsa bugfix for date parser 2012-08-21 02:39:28 +02:00
Michael Peter Christen
3ce4c2f937 fixes for gsa result format 2012-08-21 01:57:46 +02:00
Michael Peter Christen
2d5fdfeb65 added authorization-based maximum results limitation to solr and gsa
search
2012-08-20 17:10:48 +02:00
Michael Peter Christen
6fc5400f91 added a tooltip for search navigation to mention that search pages can
be navigated using the TAB key
2012-08-20 13:02:29 +02:00
Michael Peter Christen
a06123aec6 more abstraction and less parameter overhead for remote search 2012-08-20 01:29:15 +02:00
Michael Peter Christen
f00733186b code simplifications 2012-08-19 13:17:03 +02:00
orbiter
780f8974e7 added ramaining iteration methods for solr in fulltext class 2012-08-18 15:39:14 +02:00
orbiter
6f01542aaa explicit double-check in transferURL 2012-08-18 13:18:51 +02:00
Michael Peter Christen
d54b80327a refactoring 2012-08-17 17:28:27 +02:00
Michael Peter Christen
0cab06c47c refactoring 2012-08-17 15:52:33 +02:00
Michael Peter Christen
40c0856489 refactoring 2012-08-17 15:33:02 +02:00
Michael Peter Christen
e651d3e320 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-17 14:45:18 +02:00
Michael Peter Christen
06a78eecb7 code simplification 2012-08-17 14:43:32 +02:00
cominch
8a91f4fa42 local robots.txt: disallow external crawlers to follow the URL proxy 2012-08-17 11:47:39 +02:00
Michael Peter Christen
18f989dfb1 - refactoring (load -> getMetadata)
- added getDocument to retrieve Solr documents which shall replace
getMetadata
2012-08-17 01:34:38 +02:00
Michael Peter Christen
6197caf698 added clear-text search words in query params 2012-08-16 23:05:37 +02:00
Michael Peter Christen
23226676c6 FOR THE BRAVE.. this is a forced migration to solr which is now ready
for production as a replacement of the metadata-db.
This intermediate release 1.041 will switch on the previously optional
solr index and the old metadata-db will still work as it did before.
Solr+metadata are accessed in mixed mode, no migration is done yet.
If this causes not a catastrophe until the end of the weekend, we will
do a YaCy 1.1 main release containing this as default.
2012-08-16 18:17:47 +02:00
Michael Peter Christen
7c31be1c80 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-16 17:45:26 +02:00
cominch
6456a1656a changed local robots.txt to prevent external crawlers to submit random
search queries
2012-08-16 17:38:10 +02:00
Michael Peter Christen
703f427303 fixed some peer-ping connection details
- larger time-out
- removed too old seedlist
- fixed a bug in connection test
2012-08-16 17:11:54 +02:00
Michael Peter Christen
597bb76e4f get the peer location more quickly 2012-08-16 16:28:57 +02:00
orbiter
156d457aec fix for Index out of bounds exception in Network servlet 2012-08-16 07:47:52 +02:00
Lotus
ae9cd7a118 fix xss bug #204 2012-08-15 14:23:21 +02:00
Michael Peter Christen
d988ba50cf added a very rudimentary, incomplete, non-verified GSA response writer
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10
2012-08-14 12:40:26 +02:00
Michael Peter Christen
aab0b680c3 - added xslt support for solr result formats.
try i.e.
http://localhost:8090/solr/select?q=*:*&start=0&rows=10&wt=xslt&tr=json.xsl
- added servlet-side mime-type configuration for streamed servlets. this
is used for the result formatters in solr result formats
2012-08-14 11:12:50 +02:00
cominch
ad62609ec7 added a possibility to define a custom network definition URL for remote
management
2012-08-13 16:57:53 +02:00
cominch
fb0f430685 Merge remote-tracking branch 'original yacy/master' 2012-08-13 16:48:14 +02:00
Michael Peter Christen
b51df6c7e8 - added coordinate storage in solr schema
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html
2012-08-13 10:40:04 +02:00
orbiter
9b88433f45 patch from hint in
http://forum.yacy-websuche.de/viewtopic.php?p=26858#p26858
from gaston
2012-08-10 15:44:37 +02:00
orbiter
e816b88b55 changed behaviour of metadata storage: in case that any solr is
attached, the metadata is not written to the metadata-db, even if it is
enabled but instead to solr. This prevents that metadata is written in
two store systems at the same time. It is also the next step to migrate
the current metadata-db to solr.
2012-08-10 15:39:10 +02:00
Michael Peter Christen
f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
2012-08-10 13:26:51 +02:00
Michael Peter Christen
b2b480fff2 more abstraction of the YaCySchema -> Opensearch matching process 2012-08-10 09:48:15 +02:00
Michael Peter Christen
73f6d69d03 more abstraction for solr query params parsing 2012-08-10 07:58:45 +02:00
Michael Peter Christen
24462e9baa set the title every time, it is possible that it has changed 2012-08-10 07:51:57 +02:00
Michael Peter Christen
136fcb1ad9 refactoring 2012-08-10 06:47:13 +02:00
Michael Peter Christen
a12f693ec9 added two response writer for embedded solr interface:
a rss/opensearch writer and an enhanced solr xml writer.
The enhanced solr writer has less configuration overhead than the
original writer and should by slightly faster. The rss/opensearch writer
is at this time slightly incomplete compared with the already existing
rss search result form YaCy and also snippets are missing at this time.
To test the new interface, open for example:
http://localhost:8090/solr/select?wt=rss&q=olympia
The wt-code for the new result writers are=
wt=rss for opensearch
wt=exml for the enhanced solr xml writer.
Additionally, the SRU search parameters had been added to the solr
interface which can now also be used for a normal solr/xml search.
2012-08-09 18:06:48 +02:00
orbiter
67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-05 15:49:48 +02:00
orbiter
d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
writings to the Metadata-DB are now also done to solr. This includes
metadata transfer during search and rwi transfer.

The new/added solr fields are:

## time when resource was loaded
load_date_dt

## date until resource shall be considered as fresh
fresh_date_dt

## id of the host, a 6-byte hash that is part of the document id
host_id_s

## ids of referrer to this document
referrer_id_ss

## the md5 of the raw source
md5_s

## the name of the publisher of the document
publisher_t

## the language used in the document; starts with primary language
language_ss

## an external ranking value
ranking_i

## the size of the raw source
size_i

## number of links to audio resources
audiolinkscount_i

## number of links to video resources
videolinkscount_i

## number of links to application resources
applinkscount_i
2012-08-05 15:49:27 +02:00
Michael Peter Christen
70b10e8316 added the JSON response writer to solr interface, add &wt=json to the
servlet GET properties to use this format
2012-08-01 00:14:56 +02:00
Michael Peter Christen
8d944f6517 nowrap from gaston in forum
http://forum.yacy-websuche.de/viewtopic.php?p=26815#p26815
2012-07-30 12:39:47 +02:00
Michael Peter Christen
24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
value than crawling processes. This speeds up the search result
preparation dramatically.
2012-07-30 10:38:23 +02:00
Michael Peter Christen
1687737771 Abstraction of HandleMap and HandleSet 2012-07-27 12:13:53 +02:00
Michael Peter Christen
3bcd9d622b cleaned up classes and methods which are either superfluous at this time
or will be superfluous or subject of complete redesign after the
migration to solr. Removing these things now will make the transition to
solr more simple.
2012-07-25 14:31:54 +02:00
Michael Peter Christen
6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is
written. Also done some code-cleanup.
2012-07-25 01:53:47 +02:00
Michael Peter Christen
315d83cfa0 cleanup 2012-07-24 22:16:56 +02:00
Michael Peter Christen
76202f068e extended abstraction of local and remote solr index using one front-end
for index administration and querying.
2012-07-24 17:23:29 +02:00
Michael Peter Christen
7ec7341f60 added user-authentication protection to solr search (same as implemented
for yacysearch)
2012-07-23 21:43:14 +02:00
Michael Peter Christen
e2a97ef8f6 better explain how to access the embedded solr 2012-07-23 21:31:12 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
Michael Peter Christen
cba4ab862e fix for http://bugs.yacy.net/view.php?id=202 2012-07-23 00:36:18 +02:00
reger
36c9875b6e removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid) 2012-07-23 00:00:40 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
6cc5d1094e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-21 13:34:57 +02:00
orbiter
05a3ffd03a patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa
2012-07-20 11:47:50 +02:00
orbiter
5a3c829872 embedded solr is only initiated if it is activated with
IndexFederated_p.html
2012-07-20 11:40:33 +02:00
Lotus
3a350a2f83 partial html fix for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454
2012-07-20 08:53:12 +02:00
Michael Peter Christen
97b7bcf2a6 added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler
2012-07-19 11:34:05 +02:00
Michael Peter Christen
f78ce93a80 collection of speed and memory saving hacks 2012-07-13 21:15:38 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
e3aa05b9dd added creation of subpath pattern when crawl start is 'from file' 2012-07-11 23:18:57 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
Roland 'Quix0r' Haeder
aef9dd0350 - removed cleaning of blacklist cache on startup
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
2012-07-10 13:08:16 +02:00
orbiter
c7afa8bc48 using SwitchboardConstants for solr attributes 2012-07-10 12:01:20 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
91f14ea38e fix to solr configuration (case where the external solr was not online) 2012-07-06 01:29:13 +02:00
sixcooler
2c5b68d932 more abstraction of error message 2012-07-05 14:50:37 +02:00
Michael Peter Christen
9758c521ab abstraction of error message 2012-07-05 14:27:28 +02:00
sixcooler
9b6e4e46ca fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430 2012-07-05 14:06:00 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
5bd3c90907 - removed unnecessary semicolons
- added default case for switch
2012-07-05 11:18:31 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
241dd8410a removed snippet pattern filter - it was not used 2012-07-05 09:21:27 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
orbiter
78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
2012-07-04 21:15:10 +02:00
Michael Peter Christen
276a66a793 Adding a limit of 1000 links that a parser shall store during indexing.
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.
2012-07-03 17:06:20 +02:00
Michael Peter Christen
1825f165b8 better integration of blacklist according to use case 2012-07-02 13:57:29 +02:00
Michael Peter Christen
c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-07-02 12:20:57 +02:00
Michael Peter Christen
ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. 2012-07-02 10:27:46 +02:00
reger
067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages) 2012-07-01 00:12:20 +02:00
Michael Peter Christen
03280fb161 removed segments-concept and the Segments class:
the segments had been there to create a tenant-infrastructure but were
never be used since that was all much too complex. There will be a
replacement using a solr navigation using a segment field in the search
index.
2012-06-28 14:27:29 +02:00
Michael Peter Christen
9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
2012-06-27 12:17:58 +02:00
cominch
e6792ed37d Merge remote-tracking branch 'original yacy/master' 2012-06-26 10:13:13 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c fixing redirects and status codes: storing of status code in
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly 2012-06-25 14:59:46 +02:00
Michael Peter Christen
b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
2012-06-25 11:34:38 +02:00
Michael Peter Christen
fad3b14813 added jetty libraries, needed for future use as web server and as
application server for the solr search interface
2012-06-22 15:31:17 +02:00
Michael Peter Christen
b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods 2012-06-22 11:39:17 +02:00
Michael Peter Christen
a5eb91fa60 refactoring 2012-06-22 00:49:32 +02:00
cominch
c1ba58ae51 Augmented browsing: Small CSS fix 2012-06-21 14:22:32 +02:00
cominch
b2b205aa38 Augmented browsing: small js fix 2012-06-21 12:02:14 +02:00
cominch
dc9ee0cdb3 Augmented browsing: CSS fix 2012-06-21 11:19:55 +02:00
cominch
74fcc6f8c5 Augmented browsing: small UI modifications 2012-06-21 11:01:02 +02:00
cominch
c63c3a4495 Show additional interaction elements in footer section on each page, if
activated in ConfigPortal.html.
This footer is also visible in augmented browsing proxy mode.
2012-06-20 18:04:23 +02:00
cominch
fa98657bb3 Augmented Browsing: changed the settings page 2012-06-20 09:10:39 +02:00
cominch
751eeade0d Merge remote-tracking branch 'original yacy/master' 2012-06-20 07:58:27 +02:00
cominch
84a11ec48c Corrected loading of default page settings on ConfigPortal.html 2012-06-20 07:55:28 +02:00
sixcooler
bea002dc15 correct table in new look of Crawler_p 2012-06-19 13:13:00 +02:00
Michael Peter Christen
8738336408 set Xms lower than Xmx 2012-06-19 08:45:49 +02:00
cominch
6b4545d6b0 Only load tag information if necessary 2012-06-19 01:40:22 +02:00
cominch
011f8a5818 Auto Tagging: Add hyperlinks to tags (provisional) 2012-06-19 01:24:06 +02:00
Michael Peter Christen
1d4e206b2b bugfix in vocabulary generation 2012-06-18 18:10:40 +02:00
cominch
2c89975378 Merge remote-tracking branch 'original yacy/master' 2012-06-18 16:16:46 +02:00
cominch
71047fe63a Augmented browsing: CSS fix 2012-06-18 16:16:31 +02:00
Michael Peter Christen
52f5d40043 better abstraction of document model generation 2012-06-18 15:55:18 +02:00
Michael Peter Christen
8b7c4d3144 produce a rdf output containing the triplestore with yacydoc; ie:
http://localhost:8090/api/yacydoc.rdf?urlhash=yOiCM7Fh1hyQ
2012-06-18 15:47:54 +02:00
cominch
f7160dae5c Merge remote-tracking branch 'original yacy/master' 2012-06-18 15:44:50 +02:00
cominch
e4555cbee3 Augmented browsing: Pass on additional action parameter 2012-06-18 15:44:01 +02:00
Michael Peter Christen
24bbe359ca integrate also geonames library files for less cities. these are more
useful for tagging since less normal words are false-identified as
location
2012-06-18 15:19:57 +02:00
Michael Peter Christen
5a41e739b4 better apilink description 2012-06-18 13:04:20 +02:00
Michael Peter Christen
e16e4bd2ba added ontology extraction in xml as api call for vocabularies 2012-06-18 13:02:12 +02:00
cominch
8cf47a8335 Merge remote-tracking branch 'original yacy/master' 2012-06-18 12:11:07 +02:00
cominch
b85f01a14e Augmented browsing: small UI fix 2012-06-18 12:01:03 +02:00
Michael Peter Christen
26cb1c65c2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	source/net/yacy/document/importer/OAIPMHLoader.java
2012-06-17 23:50:44 +02:00
Michael Peter Christen
963f92ed9a - merged files
- changed behaviour of delete button in vocabulary edit
- fixed size numbe in vocabulary listing
2012-06-17 23:48:33 +02:00
cominch
d8815db877 Merge remote-tracking branch 'original yacy/master' 2012-06-17 23:07:00 +02:00
cominch
e4dab19045 Augmented Browsing: added template for document info bar 2012-06-17 23:05:53 +02:00
Michael Peter Christen
743b0ec89f - added size of vocabulary to vocabulary view
- fixed bad terms in vocabulary-from-titles autogeneration
2012-06-17 17:32:52 +02:00
Michael Peter Christen
22d5e33c5e added more methods to vocabulary generation: scrape document title and
document author to vocabulary
2012-06-17 14:53:16 +02:00
Michael Peter Christen
b2d1c25ebb removed warnings/unused entities 2012-06-17 11:22:08 +02:00
Michael Peter Christen
f1aa4c4390 - accept only location names wit a minimum length
- remove comma from synonym terms
2012-06-17 10:15:26 +02:00
Michael Peter Christen
cc9ad7198a - use only names which consists of at least two parts
- remove word from derewo from locations
2012-06-17 01:12:31 +02:00
Michael Peter Christen
9264d8b4af removed old navigation practice using subject tags in favor of
triplestore-tags
2012-06-17 00:33:40 +02:00
Michael Peter Christen
eeb4fd8b8c refactoring (geolocalzation -> geolocation) 2012-06-16 22:09:32 +02:00
Michael Peter Christen
64c0268b2b show triplestore metadata in yacydoc and viewfile 2012-06-16 17:40:15 +02:00
Michael Peter Christen
c2f0d16d2c fixed vocabulary initialization 2012-06-16 13:12:02 +02:00
Michael Peter Christen
fbded1f466 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-16 12:42:43 +02:00
Michael Peter Christen
df3531f8d5 added the generation of virtual vocabularies using the pnd 2012-06-16 12:36:15 +02:00
Michael Peter Christen
e806106b10 jquery bugfix 2012-06-16 08:25:28 +02:00
Michael Peter Christen
a0f1decd82 - added loading of the dbpedia pnd triplestore in the dictionary loader
- renamed the dictionary loader to knowledge loader
- some refactoring in the library provider method names
2012-06-15 19:19:18 +02:00
Michael Peter Christen
6d17686258 made triplestore persistent by default
added a size display in triplestore servlet
2012-06-15 19:13:07 +02:00
Michael Peter Christen
8d6e77ad0c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-15 14:38:46 +02:00
cominch
2ac7a5c1f2 Augmented browsing: Add overlay bar which shows the vocabulary tags 2012-06-15 14:32:16 +02:00
Michael Peter Christen
777d22e145 renamed "augmented proxy" to "augmented browsing" 2012-06-15 11:54:33 +02:00
cominch
bddac2839e add missing files for tag display 2012-06-15 10:46:19 +02:00
cominch
441430f507 Merge remote-tracking branch 'original yacy/master' 2012-06-15 10:44:12 +02:00
cominch
3c255c025b Show tags in search results (if activated in ConfigPortal_p.html) 2012-06-15 10:43:05 +02:00
Michael Peter Christen
1f9120d189 create new vocabularies also without an objectspace. this creates an
empty vocabulary
2012-06-15 02:43:55 +02:00
Michael Peter Christen
a5cdfb91de - fixed Cache link (below snippet)
- added 'Augmented Proxy' link below snippet
- added configuration options for augmented proxy
2012-06-14 19:55:34 +02:00
Michael Peter Christen
492b3e09f2 added api icon to triplestore 2012-06-14 19:11:19 +02:00
Michael Peter Christen
16d8f33795 added objectlink generation to vocabulary generation and editor 2012-06-14 18:50:35 +02:00
Michael Peter Christen
f1f97b7c95 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-14 18:45:38 +02:00
Michael Peter Christen
b3eaaf5ebc check also delete triplestore by default 2012-06-14 18:14:45 +02:00
cominch
f2f07a11f1 hotfix for unresolved pattern 2012-06-14 18:05:10 +02:00
cominch
5fd1a15fcf hotfix until we have updated query routine for tags 2012-06-14 17:56:38 +02:00
cominch
f49d92d8da Cleanup of interaction class and helper routines 2012-06-14 17:41:45 +02:00
cominch
56b0115054 Triplestore: modify routines to access per user store 2012-06-14 15:44:27 +02:00
Michael Peter Christen
d45718251e refactoring (Localization -> Location) 2012-06-14 09:45:57 +02:00
Michael Peter Christen
b8b3c87ba7 - renamed localization to location (that was confusing)
- renamed 'Locale' navigator to 'Location'
- produce Location navigation only if geolocation libraries are loaded
2012-06-14 09:44:14 +02:00
sixcooler
f64e78497a fix for reload-feature in Crawler_p 2012-06-14 02:13:23 +02:00
Michael Peter Christen
e89747bb67 - added automated generation of vocabularies from url stubs
- added clear of all terms for vocabularies
- added deletion of vocabularies
2012-06-13 15:53:18 +02:00
Michael Peter Christen
79464189a4 The 'Locale' vocabulary, which is generated by geo data, has now the
objectspace "http://dbpedia.org/resource/"
2012-06-13 13:05:41 +02:00
Michael Peter Christen
eca38c53e7 added a vocabulary editor 2012-06-13 12:12:20 +02:00
Michael Peter Christen
80e8aaabc8 moved new servlets into one submenu "Content Semantic" 2012-06-12 02:12:01 +02:00
Michael Peter Christen
2bbb6c52cf added option to clean the triplestore when deleting the index 2012-06-12 01:54:36 +02:00
Michael Peter Christen
8b53771db2 changed behavior of navigation processing:
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
2012-06-11 23:49:30 +02:00
Michael Peter Christen
5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
2012-06-11 16:48:53 +02:00
cominch
c90f174799 preparation and generalization of augmented browsing methods 2012-06-11 09:23:44 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
#143

Conflicts:
	htroot/Supporter.java
	htroot/yacy/crawlReceipt.java
	htroot/yacy/transferRWI.java
	htroot/yacy/transferURL.java
	source/de/anomic/crawler/CrawlStacker.java
	source/de/anomic/data/ListManager.java
	source/net/yacy/peers/Protocol.java
	source/net/yacy/repository/Blacklist.java
	source/net/yacy/repository/LoaderDispatcher.java
	source/net/yacy/search/Switchboard.java
	source/net/yacy/search/index/MetadataRepository.java
	source/net/yacy/search/index/Segment.java
	source/net/yacy/search/query/RWIProcess.java
	source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Roland 'Quix0r' Haeder
213f006bf1 One is okay ...
Conflicts:
	htroot/Trails.html
2012-06-10 23:40:07 +02:00
Roland 'Quix0r' Haeder
af5a597e47 Scroogle is not comming back, remove dead code
Conflicts:
	source/net/yacy/search/Switchboard.java
2012-06-10 23:38:41 +02:00
cominch
7a4dab6d1d - removed unused variables
- do not replace malformed or invalid URLs in urlproxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835
6c8d7289-2bf4-0310-a012-ef5d649a1542

Conflicts:
	source/de/anomic/http/server/HTTPDFileHandler.java
2012-06-10 23:33:09 +02:00
Michael Peter Christen
90c6fc4b63 load all - but not the persistent local.rdf - triples from
DATA/TRIPLESTORE at startup time. The local.rdf is loaded only if the
persistent switch is on (as before).
2012-06-10 21:49:02 +02:00
Michael Peter Christen
a9eb40c160 fix for autocomplete in index.html 2012-06-10 14:44:37 +02:00
Michael Peter Christen
dd020a1a8a removed autocrawler and feedback servlet link since that was not
cherry-picked
2012-06-10 13:17:23 +02:00
cominch
aa0295917c augmentation
Conflicts:
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 13:10:21 +02:00
cominch
87a3fbb3c2 interaction javascript 2012-06-10 13:09:00 +02:00
cominch
ed2ea0f08e augmented browsing modification
Conflicts:
	htroot/interaction/OverlayInteraction.html
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 13:07:57 +02:00
cominch
d4802dc8d5 small change 2012-06-10 13:02:30 +02:00
cominch
a120ef660b RDF demo servlet 2012-06-10 13:02:11 +02:00
cominch
09a34cfe1b prepare RDF dump routines 2012-06-10 12:58:40 +02:00
cominch
300b235ce8 Updated Demo Servlet
Conflicts:
	htroot/About.html
	htroot/DemoServlet.html
	htroot/DemoServlet.java
	htroot/interaction/interaction.js
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:58:29 +02:00
cominch
90512640bf Added config switches for custom parser
Conflicts:
	source/net/yacy/document/TextParser.java
2012-06-10 12:49:36 +02:00
cominch
a12cbcba36 Add a global value store 2012-06-10 12:45:01 +02:00
cominch
e14f2881ae interaction: add special table interaction
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:41:16 +02:00
cominch
4e4e7a99f8 interaction: add global variable store
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:34:36 +02:00
cominch
bde07ed7a8 Add tagging overlay element
Conflicts:
	htroot/env/templates/jqueryheader.template
	htroot/yacysearchitem.java
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:28:50 +02:00
cominch
bee3bee8f3 Small fix - return value of JSON should be empty 2012-06-10 12:20:13 +02:00
cominch
ff4ba3ee05 Small fix
Conflicts:
	htroot/yacysearchitem.java
2012-06-10 10:56:39 +02:00
cominch
f05e3968f7 Quick fix 2012-06-10 10:55:09 +02:00
cominch
e859481889 Add Triplestore settings functionality
Conflicts:
	htroot/env/templates/header.template
2012-06-10 10:55:00 +02:00
cominch
b0bc0b4572 Add new demonstration module for client-side key-value store (backend:
triplestore): /DemoServletInteraction.html

Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 10:53:30 +02:00
cominch
c9dc6cda02 Demonstration: include value from interaction in search results
Conflicts:
	htroot/interaction/OverlayInteraction.html
	htroot/yacysearchitem.java
2012-06-10 10:51:53 +02:00
cominch
ae8adb0e58 Small changes 2012-06-10 10:44:16 +02:00
cominch
bcbd8eee33 Add several parsers, for RDFa and rdf files.
Conflicts:
	source/net/yacy/document/TextParser.java
2012-06-10 10:42:33 +02:00
cominch
9ef5a80f4e add interaction for triples and selector for augmented browsing
Conflicts:
	htroot/interaction/interaction.js
	source/net/yacy/interaction/Interaction.java
2012-06-10 10:38:54 +02:00
cominch
5d20cd324a Add Triplestore and RDF query interface
Conflicts:
	build.xml
	defaults/yacy.init
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:35:59 +02:00
cominch
bc9a618e0a augmented browsing: ignore js and css, integrate more user interaction
Conflicts:
	htroot/interaction/Footer.html
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:29:15 +02:00
cominch
9cbfc1a1c0 augmentedProxy, which forwards every proxy request to a
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.

Conflicts:
	source/de/anomic/http/server/HTTPDProxyHandler.java
2012-06-10 10:15:34 +02:00
cominch
1626be7916 Add menu entries for urlproxy / augmented browsing 2012-06-10 09:59:30 +02:00
Michael Peter Christen
5b25272f40 added location search to main menu 2012-06-09 09:10:54 +02:00
Michael Peter Christen
ea0dceb55d bugfix: do not switch off standard memory strategy when performing a
forced GC
PLEASE CHECK if your peer has standard memory switched on!
2012-06-08 09:48:46 +02:00
Michael Peter Christen
dd14b19c26 lazy initialization of block rank table ... only normal web search uses
this. When interactive search or location search is used, the block rank
is switched off
2012-06-08 09:41:29 +02:00
Michael Peter Christen
701b9a28a0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	htroot/PerformanceMemory_p.java
2012-06-08 09:16:16 +02:00
Michael Peter Christen
ab7107b34b fixed RWIProcess queue limits: now discovering hidden results for mass
result retrieval
2012-06-08 09:14:54 +02:00
Michael Peter Christen
10c9c17d51 fixed handlemap spread factor and null iterator handling 2012-06-08 09:13:41 +02:00
Michael Peter Christen
a61f44f9e4 lazy initialization of block rank table.
this causes that the table is not initialized when there is no search is
done. the effect is most strong if YaCy is started headless which causes
no browser pop-up which otherwise would load the search page and
therefore trigger the initialization of the table.
2012-06-07 13:16:38 +02:00
Michael Peter Christen
c8bbd180e4 enhanced hint for debian package automatic update 2012-06-07 12:36:26 +02:00
Michael Peter Christen
9ad84c5e9f fix for NPE in PerformanceMemory 2012-06-07 12:36:05 +02:00
Michael Peter Christen
96e9d77270 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
2012-06-06 20:13:28 +02:00