Commit Graph

1496 Commits

Author SHA1 Message Date
Michael Peter Christen
39317a6c66 enhanced webstructure image: introduced
- multiple hosts can be listed (comma-separated) as host argument
- new 'bf'-attribut (branch factor): the maximum number of edges per
node
- the bf-value is computed automatically
- ordering of nodes when the graphic is drawed: mostly the drawing ends
with an limitation eg. number of nodes. When this happens, it should be
ensured that more 'interesting' nodes are painted in advance. This is
now done by sorting all nodes by the number of links they have in de
distant sub-graph.
2012-10-22 16:23:39 +02:00
sixcooler
47ae7e322e smaller dhtDispatcher.cloudSize
@Orbiter: we talked about this times ago - please revert if I'm wrong
2012-10-21 20:05:28 +02:00
sixcooler
57ddd63888 not hold a expensive cache of references for DHT-out,but but load them
on demand
see: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4530
2012-10-21 20:00:36 +02:00
Michael Peter Christen
ea27d2e5f6 fixed more getSolrFieldName usages 2012-10-18 15:21:05 +02:00
Michael Peter Christen
ce0e5b1e17 - more refactoring / private methods
- fix for usage of custom solr field names
2012-10-18 15:09:04 +02:00
Michael Peter Christen
ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
superfluous. The target is to make a solr document as the core of YaCy
documents which would cause that many conversions can be removed. On the
way to this target the Equivalence of URIMetadataRow and URIMetadataNode
had to be removed to expose the usage of the old URIMetadataRow data
structure.
This refactoring already removes unneccessary conversions and should
make memory usage during indexing lower.
2012-10-18 14:29:11 +02:00
Michael Peter Christen
b400fc7b4d fix for file parser problem 2012-10-17 18:06:44 +02:00
Michael Peter Christen
e5b3c172ff removed hack which translated Solr documents to virtual RWI entries
which had been then mixed with remote RWIs. Now these Solr documents are
feeded into the result set as they appear during local and remote
search. That makes the search much faster.
2012-10-17 17:45:41 +02:00
Michael Peter Christen
6017691522 added an exception catch 2012-10-17 13:56:11 +02:00
Michael Peter Christen
5d16c23a1f specified more URIMetadata as URIMetadataNode 2012-10-16 18:26:21 +02:00
Michael Peter Christen
43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
URIMetadataNode which creates the opportunity to access Solr objects
directly and use their information richness
- lazy initialization of the URIMetadataNode object - should cause less
computation and memory usage during search.
- removed dead code
2012-10-16 18:11:57 +02:00
Michael Peter Christen
cc98496ff3 enhanced the HostBrowser:
- showing also outbound links to other domains if there are any
- the outbound links browser shows also the link structure image
- showing even inbound links if the web structure graph has information
about that
- removed the left menu and made the HostBrowser a part of the top menu
for search
- moved the file search also to the top menu
- added hover information in the HostBrowser to explain what the click
means
- because the HostBrowser also links to the Metadata viewer ViewFile,
there should be a button to switch back to the HostBrowser: added that
also.
2012-10-16 17:13:18 +02:00
Michael Peter Christen
21fe8339b4 - enhanced generation of url objects
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
4023d88b0b added date info in parser errors 2012-10-15 10:57:36 +02:00
Michael Peter Christen
1b02408936 use less cache 2012-10-11 14:32:37 +02:00
Michael Peter Christen
e45a3235e0 default cache size was much too high; decreased solr cache size 2012-10-11 12:03:48 +02:00
Michael Peter Christen
613cf7da7f enhancement to post argument parsing - possible fix to zero-filled
parameter values
2012-10-11 10:46:06 +02:00
Michael Peter Christen
36c13ed15b less solr prefetch 2012-10-11 10:17:05 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of & parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
53789555b9 fix for crawl start filter 2012-10-10 10:40:32 +02:00
orbiter
68d0f8de03 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-10-09 20:36:32 +02:00
reger
bfb0d4c69b - add language detection from <html lang="xx"> tag
- add jaudiotagger jar to Netbeans-IDE project classpath
2012-10-09 20:02:58 +02:00
Michael Peter Christen
7e3e45fd04 added Open Graph Metadata default fields, see http://ogp.me/ns# 2012-10-09 17:28:48 +02:00
Michael Peter Christen
c3e5f667a7 added schema.org breadcrumb counter to parser and solr schema 2012-10-09 13:02:43 +02:00
Michael Peter Christen
a06930662c replaced some more .getBytes() with UTF8/ASCII.getBytes() 2012-10-09 12:14:28 +02:00
Michael Peter Christen
bd769de604 since the solr index is now used for all pages that are indexed locally,
there is no need for the RWI index if the index is not transfered to
another peer. Therefore the creation of RWI index data is now suppressed
if DHT is disabled. This applies for all intranet and portal mode
configurations, but not for public robinson modes. A robinson may switch
back to public mode and then transmit its data. That means if someone
wants to switch never to DHT mode, it would be more appropriate to
choose the portal mode.
2012-10-09 11:48:55 +02:00
Michael Peter Christen
4b5e0c1500 added an url rewriter which can be used to remove session ids from urls 2012-10-09 11:24:48 +02:00
Michael Peter Christen
877042a6b5 fix for portal mode 2012-10-08 14:54:06 +02:00
Michael Peter Christen
76d218fbef fixes to crawl profiles 2012-10-08 10:50:40 +02:00
Michael Peter Christen
2f536cb54d code cleanup: removed unised methods and made more methods and objects
private
2012-10-08 10:50:24 +02:00
Michael Peter Christen
584663ae8c - redesign of solr query construction
- fix for solr boosts and location search
- fix for number of search results in local search
2012-10-07 07:46:55 +02:00
Michael Peter Christen
6ab64746d7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-06 03:35:32 +02:00
Michael Peter Christen
a8167e6e5b clean-up: removed unused methods in kelondro 2012-10-06 03:34:52 +02:00
sof
5cb244b79b Merge remote branch 'origin/master' 2012-10-05 18:54:39 +02:00
apfelmaennchen
88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
2012-10-05 18:54:26 +02:00
Michael Peter Christen
28bd3e62b1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-05 00:04:09 +02:00
orbiter
4fed4a86d8 another fix to location search 2012-10-04 22:44:44 +02:00
orbiter
0f7a54452d fix for location search query encoding 2012-10-04 14:46:40 +02:00
Michael Peter Christen
31485a963d refactoring 2012-10-02 21:57:50 +02:00
Michael Peter Christen
f8a3ab2d82 added the usage of synonyms to the GSA search interface 2012-10-02 14:29:45 +02:00
Michael Peter Christen
3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
synonyms_sxt
2012-10-02 11:13:06 +02:00
Michael Peter Christen
41ab2a2279 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:24:03 +02:00
orbiter
c8b1a693dc ups, added missing class for last commit 2012-10-02 10:23:10 +02:00
Michael Peter Christen
3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:14:09 +02:00
orbiter
3190347814 added a synonyms_t field to solr and a process to read synonym files.
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
2012-10-02 00:02:50 +02:00
Michael Peter Christen
411d0e839b added an underline text field to solr to record all underlined texts 2012-10-01 14:16:49 +02:00
Michael Peter Christen
c4a3d8870f fixed computation of links in host browser which are not indexed but
knwon by the crawler. Such links are now displayed in grey color.
2012-09-29 02:13:11 +02:00
Michael Peter Christen
f45f7fc12e added new Host Browser to main menu:
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
8556a3d521 extended solr connector with a method to retrieve a single facet. 2012-09-28 13:50:13 +02:00
Michael Peter Christen
816cb6ce93 another fix for the debian installer: the installer fails because some
classes had unresolved dependencies. This fix removes the dependencies.
2012-09-28 09:00:40 +02:00