yacy_search_server/source/net/yacy/document/parser
reger 59c6532a65 add link extraction to pdfParser
this extracts clickable links in pdf and adds it to the list of links

include a test case for this function

this is the corrected comment for commit:
aa2e15d846
2014-10-06 04:51:31 +02:00
..
augment - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
html fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
images fix image search expand box, cut-off of 2nd capture line height 2014-10-03 01:43:05 +02:00
rdfa added an option to set 'obey nofollow' for links with rel="nofollow" 2014-07-18 12:43:01 +02:00
xml do YaCy p2p connections using a timeout-request which covers the http 2014-01-19 15:21:23 +01:00
apkParser.java activated the new apk parser which was already ready but not included in 2014-09-24 13:32:58 +02:00
audioTagParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
bzipParser.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
csvParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
docParser.java extract author and keywords in .doc and .ppt parser 2014-06-29 02:54:09 +02:00
dwgParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
genericParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
gzipParser.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
htmlParser.java fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
linkScraperParser.java added linkScraperParser, a parser which ignores the text like the 2014-07-07 13:37:17 +02:00
mmParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
odtParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
ooxmlParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
pdfParser.java add link extraction to pdfParser 2014-10-06 04:51:31 +02:00
pptParser.java extract author and keywords in .doc and .ppt parser 2014-06-29 02:54:09 +02:00
psParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
rdfParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
rssParser.java simplify rssreader and improve atom feed link extraction 2014-08-10 01:29:16 +02:00
rtfParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
sevenzipParser.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
sidAudioParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
sitemapParser.java fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
swfParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
tarParser.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
torrentParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
vcfParser.java added an option to set 'obey nofollow' for links with rel="nofollow" 2014-07-18 12:43:01 +02:00
vsdParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
xlsParser.java - replaced the properties object in AnchorURL with distinct variables 2013-09-15 23:27:04 +02:00
zipParser.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00