Commit Graph

149 Commits

Author SHA1 Message Date
orbiter
474e29ce4a added options to configure the 'corporate identity'-icons, the home page link and the greeting line from
the skin menue. Additionally an example is given there how to integrate a search page with an iframe.
Please see the skin menu.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 23:37:04 +00:00
orbiter
c998dc6556 - added security functions to flush url and search caches in case that memory is full
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-16 21:39:58 +00:00
orbiter
994c609cf8 added new shell script to do a web search from the terminal
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 21:33:36 +00:00
orbiter
f5ef7f222e - fixed a bug in parser (directory paths had not been recognized)
- no access check when a search is made only local without snippet fetch
- added comment and status message in resourceObserver (this takes very long at startup time!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 09:54:58 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
orbiter
b21598bdd0 - enhanced handling of own IP address inside seed
- prevention of false information of own IP address
- enabled searching before an own IP address is assigned (before first ping happened)
- removed warning about limited search function
- added better time-out settings for peer-ping process (10 seconds complete, 5 seconds for back-ping)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-05 11:01:20 +00:00
orbiter
2a604b7402 added superfast search result computation which can be obtained for local search when snippet fetching is disabled. An example search for the rss interface would be:
http://localhost:8080/yacysearch.rss?query=yacy&Enter=Search&contentdom=text&count=10&resource=local&verify=false
(just add "&verify=false")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-03 23:06:01 +00:00
orbiter
d8277e6af1 - added parsing of numeric html entities for crawler
- fixed a bug in search response

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-24 10:26:24 +00:00
orbiter
0c173821fd more access security regarding database access and snippet retrieval: restrict number of results for not-authorized searchers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-23 09:45:33 +00:00
orbiter
53dfe9fe9a added RECENT command for search query:
add RECENT (in uppercase letters) to the search words and results will be ordered by date (recent first)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4825 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-18 21:29:43 +00:00
orbiter
3aa69dab94 prevent too high search request frequency submitted from the same peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-17 00:11:35 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
lotus
9bc56a9edc xss protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 16:37:13 +00:00
orbiter
b32736762c enhanced rssTerminal
- 3 lines possible
- distinguishing of private and public data, if not authorized only public data is shown
- shows now more events, including local searches in clear text if user is logged in
- simplyfied peer events
- better recognition of 'real' new peers
- presentation of peer pings from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 23:05:48 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00
orbiter
2c0c8f0f0c SRU compliance according to
http://www.loc.gov/standards/sru/specs/search-retrieve.html
The example given on this page can be used to retrieve opensearch-compatible rss pages with YaCy

Try it:
The transcription to YaCys search servlet address is
http://localhost:8080/yacysearch.rss?version=1.1&operation=searchRetrieve&query=dinosaur&maximumRecords=1&recordSchema=dc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4730 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-23 16:16:41 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
93633abed8 - removed some debugging code from search process - should speed up now
- added some profiling code to search event - more time details in PerformanceSearch_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 00:55:04 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
b4ed937f1e - modified zone navigation (does still not work correctly)
- added dht switch in network definition
- 0.574

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4550 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 11:09:38 +00:00
orbiter
a7abee3578 - fixed some data types in new search stack
- added image domain presentation to image preview
- added new search page to menu
- added automatic re-search when an old search profile is requested and a crawl is ongoing,
  to fetch newly crawled entries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 23:40:38 +00:00
orbiter
61a81820e3 - refactoring of search tracker
- added link to search history to repeat the search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 23:35:48 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
159aaf8889 re-introduced global search limitation when index receive is switched off
this was necessary because othervise robinson peers did also global searches, which cannot be a wanted effect

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-06 20:29:22 +00:00
orbiter
3c7b94c119 - fix for online caution delay settings, see
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=738&p=4723#p4723
- removed remote search limitation for non-dht-peers according to discussion in
  http://forum.yacy-websuche.de/viewtopic.php?f=15&t=793&hilit=&p=5277#p5277

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 20:11:50 +00:00
orbiter
a1e9e6e2e6 fix for search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:23:04 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
a8a5df4a51 - more dublin core naming of page metadata
- better presentation of result counters in search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 21:58:30 +00:00
orbiter
efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
- refactoring of plasmaParserDocument to use Dublin Core - compatible property names
- redesign of url handling in parser and condenser (less String-to-yacyURL conversion)
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 11:51:43 +00:00
orbiter
45339c3db5 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-18 17:14:02 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
- more generics
- removed unused commons classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-08 20:12:31 +00:00
orbiter
270d016d89 fix for missing anonymization in search profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 18:57:43 +00:00
orbiter
5185acaf41 - reduced default search time
- this can be configured using the network definition file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4254 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-07 03:25:57 +00:00
orbiter
f243e338cf implemented online caution also for local and remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-06 21:53:17 +00:00
orbiter
6680634703 removed unnecessary functions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-06 21:29:47 +00:00
orbiter
b46bcaa5d8 changed method of profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-04 20:19:13 +00:00
orbiter
2fcd18a972 - fixed bad behaviour of search event worker processes
- fixed export of url lists in xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-23 01:08:16 +00:00
orbiter
445c0b5333 added domain list extraction and html export format
to URL administration menu http://localhost:8080/IndexControlURLs_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 20:47:06 +00:00
orbiter
c48b73cda2 redesign of ranking data structure
- the index administration now uses the same code base for url selection and collection
  as the search interface. The index administration is therefore a good test environment for
  ranking order control
- removed old postsorting-algorithms, will be replaced with new one
- fixed many bugs occurred before during ranking; especially the contraint filtering method
  removed too many links
- fixed media search flags; had been attached to too many urls. The effect should be a better
  pre-sorting before media load within snippet fetch

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-21 23:14:57 +00:00
orbiter
6f1308da2f - some enhancements to IndexControlURLs (shows more links, connects referrer to another query)
- some refactoring to search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-17 01:53:02 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
orbiter
9d539ec621 added option to display the network name as page greeting instead the page greeting string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4174 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 08:01:44 +00:00
fuchsi
ebfd1e0b42 remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-05 18:52:15 +00:00
fuchsi
ed20531e68 don't encode in channel element as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4144 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 12:12:27 +00:00
fuchsi
c5a8585ac6 fix more encooding problems in yacysearch.rss.
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 09:21:03 +00:00