Commit Graph

8945 Commits

Author SHA1 Message Date
Michael Peter Christen
76d218fbef fixes to crawl profiles 2012-10-08 10:50:40 +02:00
Michael Peter Christen
2f536cb54d code cleanup: removed unised methods and made more methods and objects
private
2012-10-08 10:50:24 +02:00
Michael Peter Christen
584663ae8c - redesign of solr query construction
- fix for solr boosts and location search
- fix for number of search results in local search
2012-10-07 07:46:55 +02:00
Michael Peter Christen
6ab64746d7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-06 03:35:32 +02:00
Michael Peter Christen
a8167e6e5b clean-up: removed unused methods in kelondro 2012-10-06 03:34:52 +02:00
sof
5cb244b79b Merge remote branch 'origin/master' 2012-10-05 18:54:39 +02:00
apfelmaennchen
88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
2012-10-05 18:54:26 +02:00
Michael Peter Christen
28bd3e62b1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-05 00:04:09 +02:00
orbiter
4fed4a86d8 another fix to location search 2012-10-04 22:44:44 +02:00
orbiter
507c612015 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-10-04 21:32:04 +02:00
reger
5650b0333e adjusted Netbeans-IDE classpath to current jars
change solr jars to 3.6.1 (from 3.6.0)
   change lucene jars to 3.6.1 (from 3.6.0)
   added jsoup-1.6.3
2012-10-04 21:12:09 +02:00
reger
b58e1f6d67 - add translation for ConfigHeuristics_p.html # section search-result
- removed old/unused scroogle text
2012-10-04 20:57:29 +02:00
orbiter
0f7a54452d fix for location search query encoding 2012-10-04 14:46:40 +02:00
Michael Peter Christen
679d562908 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-04 13:18:52 +02:00
sixcooler
9aa21506be bump to httpcore-4.2.2 (maintenance release) 2012-10-03 02:15:02 +02:00
Michael Peter Christen
31485a963d refactoring 2012-10-02 21:57:50 +02:00
Michael Peter Christen
406e1f3e7e added an option to start indexing right from the host browser 2012-10-02 21:18:27 +02:00
Michael Peter Christen
f8a3ab2d82 added the usage of synonyms to the GSA search interface 2012-10-02 14:29:45 +02:00
Michael Peter Christen
3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
synonyms_sxt
2012-10-02 11:13:06 +02:00
Michael Peter Christen
41ab2a2279 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:24:03 +02:00
orbiter
c8b1a693dc ups, added missing class for last commit 2012-10-02 10:23:10 +02:00
Michael Peter Christen
3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:14:09 +02:00
orbiter
3190347814 added a synonyms_t field to solr and a process to read synonym files.
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
2012-10-02 00:02:50 +02:00
Michael Peter Christen
411d0e839b added an underline text field to solr to record all underlined texts 2012-10-01 14:16:49 +02:00
orbiter
be4c96f3b1 The HostBrowser now offers to index files that are discovered because
they are linked in the web interface.
2012-09-30 13:23:06 +02:00
Michael Peter Christen
c4a3d8870f fixed computation of links in host browser which are not indexed but
knwon by the crawler. Such links are now displayed in grey color.
2012-09-29 02:13:11 +02:00
Michael Peter Christen
97a47319c8 added nice links to the host browser:
- click on the file icon to get the metadata of the file
- click on the link icon behind the link to open the original file in
the browser
2012-09-28 23:09:21 +02:00
Michael Peter Christen
f45f7fc12e added new Host Browser to main menu:
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
8556a3d521 extended solr connector with a method to retrieve a single facet. 2012-09-28 13:50:13 +02:00
Michael Peter Christen
d0015df61c added lucene memory library which is now necessary as solr has to
process more complex queries
2012-09-28 13:48:51 +02:00
Michael Peter Christen
80edd8ecd7 some more after-refactoring fixes 2012-09-28 10:24:57 +02:00
Michael Peter Christen
816cb6ce93 another fix for the debian installer: the installer fails because some
classes had unresolved dependencies. This fix removes the dependencies.
2012-09-28 09:00:40 +02:00
Michael Peter Christen
c461c28c5d fix for debian package installation (caused by refactoring) 2012-09-27 17:23:10 +02:00
Michael Peter Christen
280e36c90b allow Cross-Origin Resource Sharing for all stream servlets, that is the
solr and the gsa search interface. That means that all JavaScript in
browsers now can Cross-Origin access all YaCy search interfaces, which
opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over'
concepts.
2012-09-27 12:02:24 +02:00
Michael Peter Christen
ccd65ecf8d fixed url search in IndexControlURLs_p.html / using now the solr
interface
2012-09-27 00:31:59 +02:00
Michael Peter Christen
016ffa7434 increased strength of crawling waves in network image 2012-09-26 23:32:13 +02:00
Michael Peter Christen
23f68f2a69 force usage of default faceting mechanisms for search 2012-09-26 18:48:59 +02:00
Michael Peter Christen
24d2ee3c52 - better date ranking
- more protection against NPE and time travel effects
2012-09-26 18:36:32 +02:00
Michael Peter Christen
ca313e404f - if a "/date" modifier is used, the solr remote query applies an
ordering by date (ascending)
- added also some 'anti-timetravel' protection (check if date is in the
future within any metadata date field)
2012-09-26 16:56:33 +02:00
Michael Peter Christen
a4214694df We assert that no other metadata storage than solr is used now.
Therefore a property like solrConnected() must be true all the time.
Removal of this method causes removal of all write operations to the old
metadata index.
2012-09-26 16:05:11 +02:00
Michael Peter Christen
abab291162 made the index schema retrieval public and allow cross-domain retrieval 2012-09-26 15:44:50 +02:00
Michael Peter Christen
0cec7e761a enhanced snippet extractor to find snippets also inside of tokens of an
url
2012-09-26 15:33:37 +02:00
sixcooler
c65b576a6f added filename for missing crawlname when crawling from file 2012-09-26 14:05:33 +02:00
sixcooler
6c50d016ed pdf- and zipParser should not use forced Memory-Limits 2012-09-26 14:03:51 +02:00
Michael Peter Christen
562183932b - removed ip_s from default profile since that needs a DNS lookup to
create an document entry. This makes remote search much slower.
- removed synchronization of add method if ip_s is activated to prevent
that a user configuration causes bad behavior. The disadvantage of that
is, that a index dump can cause data loss if an indexing is running
during index dump
- catched more exceptions and more NPE
- better abstraction in MirrorSolrConnector
- slight performance enhancement when only the index count is requested
(rows=0 is sufficient to get a total count)
2012-09-26 13:38:04 +02:00
Michael Peter Christen
24f4ca4d85 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-26 12:01:34 +02:00
apfelmaennchen
7efe9eb37b adding CORS access header for Network.xml to overcome cross domain
restriction (e.g. necessary to build a JavaScript YaCy
client).
2012-09-26 10:36:09 +02:00
apfelmaennchen
116f429e35 fix for java.lang.RuntimeException: TableColumnIndex not available... 2012-09-26 09:56:16 +02:00
Michael Peter Christen
5ac61591f3 better abstraction for solr query params 2012-09-25 23:59:30 +02:00
Michael Peter Christen
c913b2ba77 - fix for NPEs during remote solr configuration
- fixed remote solr setting switch
- added more logging
2012-09-25 23:59:09 +02:00