Michael Peter Christen
76d218fbef
fixes to crawl profiles
2012-10-08 10:50:40 +02:00
Michael Peter Christen
2f536cb54d
code cleanup: removed unised methods and made more methods and objects
...
private
2012-10-08 10:50:24 +02:00
Michael Peter Christen
584663ae8c
- redesign of solr query construction
...
- fix for solr boosts and location search
- fix for number of search results in local search
2012-10-07 07:46:55 +02:00
Michael Peter Christen
6ab64746d7
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-06 03:35:32 +02:00
Michael Peter Christen
a8167e6e5b
clean-up: removed unused methods in kelondro
2012-10-06 03:34:52 +02:00
sof
5cb244b79b
Merge remote branch 'origin/master'
2012-10-05 18:54:39 +02:00
apfelmaennchen
88b062210c
Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
...
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
2012-10-05 18:54:26 +02:00
Michael Peter Christen
28bd3e62b1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-05 00:04:09 +02:00
orbiter
4fed4a86d8
another fix to location search
2012-10-04 22:44:44 +02:00
orbiter
507c612015
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-10-04 21:32:04 +02:00
reger
5650b0333e
adjusted Netbeans-IDE classpath to current jars
...
change solr jars to 3.6.1 (from 3.6.0)
change lucene jars to 3.6.1 (from 3.6.0)
added jsoup-1.6.3
2012-10-04 21:12:09 +02:00
reger
b58e1f6d67
- add translation for ConfigHeuristics_p.html # section search-result
...
- removed old/unused scroogle text
2012-10-04 20:57:29 +02:00
orbiter
0f7a54452d
fix for location search query encoding
2012-10-04 14:46:40 +02:00
Michael Peter Christen
679d562908
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-04 13:18:52 +02:00
sixcooler
9aa21506be
bump to httpcore-4.2.2 (maintenance release)
2012-10-03 02:15:02 +02:00
Michael Peter Christen
31485a963d
refactoring
2012-10-02 21:57:50 +02:00
Michael Peter Christen
406e1f3e7e
added an option to start indexing right from the host browser
2012-10-02 21:18:27 +02:00
Michael Peter Christen
f8a3ab2d82
added the usage of synonyms to the GSA search interface
2012-10-02 14:29:45 +02:00
Michael Peter Christen
3d33a5bdf6
turned the synonyms_t Text field into a multi-valued String field
...
synonyms_sxt
2012-10-02 11:13:06 +02:00
Michael Peter Christen
41ab2a2279
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-02 10:24:03 +02:00
orbiter
c8b1a693dc
ups, added missing class for last commit
2012-10-02 10:23:10 +02:00
Michael Peter Christen
3b959ee002
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-02 10:14:09 +02:00
orbiter
3190347814
added a synonyms_t field to solr and a process to read synonym files.
...
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
2012-10-02 00:02:50 +02:00
Michael Peter Christen
411d0e839b
added an underline text field to solr to record all underlined texts
2012-10-01 14:16:49 +02:00
orbiter
be4c96f3b1
The HostBrowser now offers to index files that are discovered because
...
they are linked in the web interface.
2012-09-30 13:23:06 +02:00
Michael Peter Christen
c4a3d8870f
fixed computation of links in host browser which are not indexed but
...
knwon by the crawler. Such links are now displayed in grey color.
2012-09-29 02:13:11 +02:00
Michael Peter Christen
97a47319c8
added nice links to the host browser:
...
- click on the file icon to get the metadata of the file
- click on the link icon behind the link to open the original file in
the browser
2012-09-28 23:09:21 +02:00
Michael Peter Christen
f45f7fc12e
added new Host Browser to main menu:
...
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
8556a3d521
extended solr connector with a method to retrieve a single facet.
2012-09-28 13:50:13 +02:00
Michael Peter Christen
d0015df61c
added lucene memory library which is now necessary as solr has to
...
process more complex queries
2012-09-28 13:48:51 +02:00
Michael Peter Christen
80edd8ecd7
some more after-refactoring fixes
2012-09-28 10:24:57 +02:00
Michael Peter Christen
816cb6ce93
another fix for the debian installer: the installer fails because some
...
classes had unresolved dependencies. This fix removes the dependencies.
2012-09-28 09:00:40 +02:00
Michael Peter Christen
c461c28c5d
fix for debian package installation (caused by refactoring)
2012-09-27 17:23:10 +02:00
Michael Peter Christen
280e36c90b
allow Cross-Origin Resource Sharing for all stream servlets, that is the
...
solr and the gsa search interface. That means that all JavaScript in
browsers now can Cross-Origin access all YaCy search interfaces, which
opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over'
concepts.
2012-09-27 12:02:24 +02:00
Michael Peter Christen
ccd65ecf8d
fixed url search in IndexControlURLs_p.html / using now the solr
...
interface
2012-09-27 00:31:59 +02:00
Michael Peter Christen
016ffa7434
increased strength of crawling waves in network image
2012-09-26 23:32:13 +02:00
Michael Peter Christen
23f68f2a69
force usage of default faceting mechanisms for search
2012-09-26 18:48:59 +02:00
Michael Peter Christen
24d2ee3c52
- better date ranking
...
- more protection against NPE and time travel effects
2012-09-26 18:36:32 +02:00
Michael Peter Christen
ca313e404f
- if a "/date" modifier is used, the solr remote query applies an
...
ordering by date (ascending)
- added also some 'anti-timetravel' protection (check if date is in the
future within any metadata date field)
2012-09-26 16:56:33 +02:00
Michael Peter Christen
a4214694df
We assert that no other metadata storage than solr is used now.
...
Therefore a property like solrConnected() must be true all the time.
Removal of this method causes removal of all write operations to the old
metadata index.
2012-09-26 16:05:11 +02:00
Michael Peter Christen
abab291162
made the index schema retrieval public and allow cross-domain retrieval
2012-09-26 15:44:50 +02:00
Michael Peter Christen
0cec7e761a
enhanced snippet extractor to find snippets also inside of tokens of an
...
url
2012-09-26 15:33:37 +02:00
sixcooler
c65b576a6f
added filename for missing crawlname when crawling from file
2012-09-26 14:05:33 +02:00
sixcooler
6c50d016ed
pdf- and zipParser should not use forced Memory-Limits
2012-09-26 14:03:51 +02:00
Michael Peter Christen
562183932b
- removed ip_s from default profile since that needs a DNS lookup to
...
create an document entry. This makes remote search much slower.
- removed synchronization of add method if ip_s is activated to prevent
that a user configuration causes bad behavior. The disadvantage of that
is, that a index dump can cause data loss if an indexing is running
during index dump
- catched more exceptions and more NPE
- better abstraction in MirrorSolrConnector
- slight performance enhancement when only the index count is requested
(rows=0 is sufficient to get a total count)
2012-09-26 13:38:04 +02:00
Michael Peter Christen
24f4ca4d85
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-26 12:01:34 +02:00
apfelmaennchen
7efe9eb37b
adding CORS access header for Network.xml to overcome cross domain
...
restriction (e.g. necessary to build a JavaScript YaCy
client).
2012-09-26 10:36:09 +02:00
apfelmaennchen
116f429e35
fix for java.lang.RuntimeException: TableColumnIndex not available...
2012-09-26 09:56:16 +02:00
Michael Peter Christen
5ac61591f3
better abstraction for solr query params
2012-09-25 23:59:30 +02:00
Michael Peter Christen
c913b2ba77
- fix for NPEs during remote solr configuration
...
- fixed remote solr setting switch
- added more logging
2012-09-25 23:59:09 +02:00