orbiter
ae246c30c3
fixed interpretation of directDocByURL attribute during crawl start
2012-10-09 23:11:31 +02:00
orbiter
68d0f8de03
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-10-09 20:36:32 +02:00
reger
bfb0d4c69b
- add language detection from <html lang="xx"> tag
...
- add jaudiotagger jar to Netbeans-IDE project classpath
2012-10-09 20:02:58 +02:00
Michael Peter Christen
7e3e45fd04
added Open Graph Metadata default fields, see http://ogp.me/ns#
2012-10-09 17:28:48 +02:00
Michael Peter Christen
c3e5f667a7
added schema.org breadcrumb counter to parser and solr schema
2012-10-09 13:02:43 +02:00
Michael Peter Christen
a06930662c
replaced some more .getBytes() with UTF8/ASCII.getBytes()
2012-10-09 12:14:28 +02:00
Michael Peter Christen
bd769de604
since the solr index is now used for all pages that are indexed locally,
...
there is no need for the RWI index if the index is not transfered to
another peer. Therefore the creation of RWI index data is now suppressed
if DHT is disabled. This applies for all intranet and portal mode
configurations, but not for public robinson modes. A robinson may switch
back to public mode and then transmit its data. That means if someone
wants to switch never to DHT mode, it would be more appropriate to
choose the portal mode.
2012-10-09 11:48:55 +02:00
Michael Peter Christen
554db5608b
fix for ViewFile
2012-10-09 11:25:05 +02:00
Michael Peter Christen
4b5e0c1500
added an url rewriter which can be used to remove session ids from urls
2012-10-09 11:24:48 +02:00
orbiter
9190599d21
use links in AccessTracker
2012-10-08 19:47:14 +02:00
Michael Peter Christen
877042a6b5
fix for portal mode
2012-10-08 14:54:06 +02:00
Michael Peter Christen
42e525ca9a
enhanced the host browser
2012-10-08 14:00:14 +02:00
Michael Peter Christen
76d218fbef
fixes to crawl profiles
2012-10-08 10:50:40 +02:00
Michael Peter Christen
2f536cb54d
code cleanup: removed unised methods and made more methods and objects
...
private
2012-10-08 10:50:24 +02:00
Michael Peter Christen
584663ae8c
- redesign of solr query construction
...
- fix for solr boosts and location search
- fix for number of search results in local search
2012-10-07 07:46:55 +02:00
Michael Peter Christen
6ab64746d7
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-06 03:35:32 +02:00
Michael Peter Christen
a8167e6e5b
clean-up: removed unused methods in kelondro
2012-10-06 03:34:52 +02:00
sof
5cb244b79b
Merge remote branch 'origin/master'
2012-10-05 18:54:39 +02:00
apfelmaennchen
88b062210c
Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
...
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
2012-10-05 18:54:26 +02:00
Michael Peter Christen
28bd3e62b1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-05 00:04:09 +02:00
orbiter
4fed4a86d8
another fix to location search
2012-10-04 22:44:44 +02:00
orbiter
507c612015
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-10-04 21:32:04 +02:00
reger
5650b0333e
adjusted Netbeans-IDE classpath to current jars
...
change solr jars to 3.6.1 (from 3.6.0)
change lucene jars to 3.6.1 (from 3.6.0)
added jsoup-1.6.3
2012-10-04 21:12:09 +02:00
reger
b58e1f6d67
- add translation for ConfigHeuristics_p.html # section search-result
...
- removed old/unused scroogle text
2012-10-04 20:57:29 +02:00
orbiter
0f7a54452d
fix for location search query encoding
2012-10-04 14:46:40 +02:00
Michael Peter Christen
679d562908
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-04 13:18:52 +02:00
sixcooler
9aa21506be
bump to httpcore-4.2.2 (maintenance release)
2012-10-03 02:15:02 +02:00
Michael Peter Christen
31485a963d
refactoring
2012-10-02 21:57:50 +02:00
Michael Peter Christen
406e1f3e7e
added an option to start indexing right from the host browser
2012-10-02 21:18:27 +02:00
Michael Peter Christen
f8a3ab2d82
added the usage of synonyms to the GSA search interface
2012-10-02 14:29:45 +02:00
Michael Peter Christen
3d33a5bdf6
turned the synonyms_t Text field into a multi-valued String field
...
synonyms_sxt
2012-10-02 11:13:06 +02:00
Michael Peter Christen
41ab2a2279
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-02 10:24:03 +02:00
orbiter
c8b1a693dc
ups, added missing class for last commit
2012-10-02 10:23:10 +02:00
Michael Peter Christen
3b959ee002
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-10-02 10:14:09 +02:00
orbiter
3190347814
added a synonyms_t field to solr and a process to read synonym files.
...
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
2012-10-02 00:02:50 +02:00
Michael Peter Christen
411d0e839b
added an underline text field to solr to record all underlined texts
2012-10-01 14:16:49 +02:00
orbiter
be4c96f3b1
The HostBrowser now offers to index files that are discovered because
...
they are linked in the web interface.
2012-09-30 13:23:06 +02:00
Michael Peter Christen
c4a3d8870f
fixed computation of links in host browser which are not indexed but
...
knwon by the crawler. Such links are now displayed in grey color.
2012-09-29 02:13:11 +02:00
Michael Peter Christen
97a47319c8
added nice links to the host browser:
...
- click on the file icon to get the metadata of the file
- click on the link icon behind the link to open the original file in
the browser
2012-09-28 23:09:21 +02:00
Michael Peter Christen
f45f7fc12e
added new Host Browser to main menu:
...
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
8556a3d521
extended solr connector with a method to retrieve a single facet.
2012-09-28 13:50:13 +02:00
Michael Peter Christen
d0015df61c
added lucene memory library which is now necessary as solr has to
...
process more complex queries
2012-09-28 13:48:51 +02:00
Michael Peter Christen
80edd8ecd7
some more after-refactoring fixes
2012-09-28 10:24:57 +02:00
Michael Peter Christen
816cb6ce93
another fix for the debian installer: the installer fails because some
...
classes had unresolved dependencies. This fix removes the dependencies.
2012-09-28 09:00:40 +02:00
Michael Peter Christen
c461c28c5d
fix for debian package installation (caused by refactoring)
2012-09-27 17:23:10 +02:00
Michael Peter Christen
280e36c90b
allow Cross-Origin Resource Sharing for all stream servlets, that is the
...
solr and the gsa search interface. That means that all JavaScript in
browsers now can Cross-Origin access all YaCy search interfaces, which
opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over'
concepts.
2012-09-27 12:02:24 +02:00
Michael Peter Christen
ccd65ecf8d
fixed url search in IndexControlURLs_p.html / using now the solr
...
interface
2012-09-27 00:31:59 +02:00
Michael Peter Christen
016ffa7434
increased strength of crawling waves in network image
2012-09-26 23:32:13 +02:00
Michael Peter Christen
23f68f2a69
force usage of default faceting mechanisms for search
2012-09-26 18:48:59 +02:00
Michael Peter Christen
24d2ee3c52
- better date ranking
...
- more protection against NPE and time travel effects
2012-09-26 18:36:32 +02:00