Commit Graph

273 Commits

Author SHA1 Message Date
Roland 'Quix0r' Haeder
fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check 2011-12-29 00:33:16 +01:00
Michael Christen
ee9aae5cc0 more about CreativeCommons license vocabulary 2011-12-18 16:07:51 +01:00
Michael Christen
c75e1a3125 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2011-12-18 08:20:10 +01:00
Michael Christen
13f5b5f80d the component part in the YaCy Metadata is filled using the Dubling Core
vocabulary
2011-12-18 08:19:48 +01:00
Michael Peter Christen
8d2cbfb685 more vocabularies and more semantics for lod data structures 2011-12-18 08:12:34 +01:00
Michael Christen
9cd36b4c44 added vocabulary for geolocalization as used in georss 2011-12-17 02:03:45 +01:00
Michael Christen
66ab51f89d added rdf vocabulary 2011-12-17 01:09:16 +01:00
Michael Christen
c04bfaa51b refactoring 2011-12-16 23:59:29 +01:00
Michael Peter Christen
136b514f52 added a Triple Store based on Nodes that fit to the new storage classes.
Added also a first Vocabulary for the node store - Dublin Core.
2011-12-16 23:01:47 +01:00
Michael Christen
1f4afb4dc0 performance hacks 2011-12-15 15:15:53 +01:00
Michael Peter Christen
4243ace863 added phonetic classes 2011-12-14 17:33:18 +01:00
Michael Christen
204c29f010 small bugfixes for search result display and cache display 2011-12-10 01:35:38 +01:00
orbiter
35a9e8f307 - fixed network graphic
- debuged evaluation tables
- changed cache settings in template engine
- some speed hacks
- changed int angles for peer positions in network graphic to double angles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-30 20:50:41 +00:00
orbiter
e22f8497c9 - tested the ARC methods
- removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method)
- increased speed of CrawlResults page (no dns lookup any more)
- increased speed of favicon display (removed dns lookup)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 14:09:25 +00:00
orbiter
5a55397f99 some last-minute performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 11:23:52 +00:00
orbiter
804e48888b smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
should also be a little bit faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-18 13:09:07 +00:00
orbiter
78ce3b13be typo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-10 11:57:26 +00:00
orbiter
0d858d48ec replaced String with StringBuilder in suggestion process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-09 14:42:55 +00:00
orbiter
37e35f2741 normalization of url using urlencoding/decoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-08 12:02:22 +00:00
orbiter
e58438c01c - added a new retry connector for solr (for cases where solr responses are slow)
- added a new exist property into the metadataRepository which includes solr entries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-08 11:49:04 +00:00
orbiter
94eab08794 - updated opensearchdescription text and icon
- removed automatic setting of maxitems during search (can be set now elsewhere)
- updated RSSMessage.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-30 01:09:38 +00:00
orbiter
a9838f8b99 fix for http://bugs.yacy.net/view.php?id=59
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-12 22:26:48 +00:00
orbiter
1b45e33f04 added robots tag parser to solr scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 13:39:01 +00:00
orbiter
cf4fd525ee added directDocByURL attribute in crawl profile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 12:38:28 +00:00
orbiter
813f297a95 another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 08:26:31 +00:00
orbiter
b250e6466d implemented crawl restrictions for IP pattern and country lists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-29 15:17:39 +00:00
orbiter
47a8c69745 added a new feature to MultiProtocolURIs to get the locale for each url:
This is done using a new library InetAddressLocator.jar which is NOT added by default to YaCy because it is very old and with that library we will never get a debian package. However, some people want that functionality and it can be made available if the library is taken from http://javainetlocator.sourceforge.net/ and placed into the /lib directory where it will be found using reflection.
The new feature will be used to extend the crawler steering.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7975 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-27 15:26:14 +00:00
orbiter
2c3161b4ac refactoring:
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-26 21:42:28 +00:00
orbiter
6b22865dbc - removed some warinings
- removed a dead update location

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-24 01:58:54 +00:00
orbiter
0c6d95e57b - more tolerance against failure of table opening
- more connections for solrj

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-21 15:08:05 +00:00
orbiter
6b02b696b0 - add number of search results to end of rss and json output to reflect latest status of retrieval
- distinguish search access with different verify state in access of search cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 19:41:44 +00:00
orbiter
aaf7a0feaa yet another cache strategy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-15 22:40:01 +00:00
orbiter
734059d33e performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 23:34:05 +00:00
orbiter
23e81b28b2 synchronization enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 21:19:02 +00:00
orbiter
85a5487d6d YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-13 14:39:41 +00:00
orbiter
2cba860693 - fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
- patch for better urls to solr admin interface

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-08 12:23:55 +00:00
orbiter
a70dbce41c added another file tool class to yacy-cora
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:09:35 +00:00
orbiter
e02bfbde56 fix for solr url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 09:07:40 +00:00
orbiter
580beb12a5 reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-05 00:26:27 +00:00
orbiter
1c007188ad bugfixes in html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-31 16:02:06 +00:00
low012
30a8a2f76b *) replacing one ugly hack with an extended ugly hack ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 13:32:42 +00:00
low012
95379ce0b1 *) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-28 12:59:11 +00:00
orbiter
45e497a9bd fix for term iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 18:29:30 +00:00
orbiter
5dd2efc9a2 - bugfixes in html parser
- new fields in solr
- extended file viewer to debug parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 15:52:25 +00:00
orbiter
2c595a6a47 added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 10:35:25 +00:00
orbiter
41a8ee4569 added iterable implementation in KeyList
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:23:40 +00:00
orbiter
22d69a6368 refactoring in cora: added sorting package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-23 20:18:30 +00:00
orbiter
51cf697acd refactoring: moved all score-related classes to new ranking package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:37:53 +00:00
orbiter
a0d5e7b6e6 added new score comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 22:33:08 +00:00
sixcooler
63a375b801 do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:45:27 +00:00