diff options
author | Jörg Frings-Fürst <debian@jff-webhosting.net> | 2017-10-01 20:19:42 +0200 |
---|---|---|
committer | Jörg Frings-Fürst <debian@jff-webhosting.net> | 2017-10-01 20:19:42 +0200 |
commit | b8757ee40b25ad8dac375c35f23785982985c215 (patch) | |
tree | 6ede28c8add650f5fcef3c920de66da2e0cd7c80 /README.md | |
parent | 775e974b25e1c055ecaba84c50e1fed12e57a82d (diff) | |
parent | 9a955c414e34de441b5f188520314d54e3c5b3c5 (diff) |
Merge branch 'feature/upstream' into develop
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 89 |
1 files changed, 58 insertions, 31 deletions
@@ -19,20 +19,23 @@ Some examples: <code> sites = [ - {'shortname': 'mywebsite1', - 'uri': 'http://www.mywebsite1.com/info', - 'contentcss': 'div'}, - - {'shortname': 'mywebsite2', - 'uri': 'http://www.mywebsite2.com/info', - 'contentxpath': '//*[contains(concat(\' \', normalize-space(@class), \' \'), \' news-list-container \')]', - 'titlexpath': '//title'}, - - {'shortname': 'mywebsite3', - 'uri': 'http://www.mywebsite3.com/info', - 'type': 'text', - 'contentregex': 'Version\"\:\d*\.\d*', - 'user-agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0'} + {'name': 'example-css', + 'parsers': [uri(uri='https://github.com/mtill', contenttype='html'), + css(contentcss='div') + ] + }, + + {'name': 'example-xpath', + 'parsers': [uri(uri='https://example-webpage.com/test', contenttype='html'), + xpath(contentxpath='//div[contains(concat(\' \', normalize-space(@class), \' \'), \' package-version-header \')]') + ] + }, + + {'name': 'my-script', + 'parsers': [command(command='/home/user/script.sh', contenttype='text'), + regex(contentregex='^.*$') + ] + } ] </code> @@ -40,31 +43,55 @@ sites = [ * parameters: - * <b>shortname</b> - short name of the entry, used as an identifier when sending email notifications + * <b>name</b> + name of the entry, used as an identifier when sending email notifications + * <b>receiver</b> (optional) + Overrides global receiver specification. + + * parameters for the URL receiver: + * <b>uri</b> - URI of the website; If the scheme of the uri is 'cmd://', the string is interpreted as a command and the standard output (stdout) is parsed. - * <b>type</b> (optional; default: 'html') + URI of the website + * <b>contenttype</b> (optional; default: 'html') content type, e.g., 'xml'/'html'/'text'. - * <b>contentxpath</b> / <b>titlexpath</b> (optional) - XPath expression for the content/title sections to extract. If you prefer, you could use contentcss/titlecss instead. - * <b>contentcss</b> / <b>titlecss</b> (optional) - CSS expression for the content/title sections to extract. This is ignored if there is a corresponding XPath definition. - * <b>contentregex</b> / <b>titleregex</b> (optional) - Regular expression. If XPath/CSS selector is defined, the regular expression is applied afterwards. - * <b>encoding</b> (optional; default: 'utf-8') + * <b>enc</b> (optional; default: 'utf-8') Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'. - * <b>splitregex</b> (optional) - only works if type is set to 'text'; defines that content should be split to chunks based on the defined regex expression. - * <b>receiver</b> (optional) - Overrides global receiver specification. - * <b>user-agent</b> (optional) + * <b>userAgent</b> (optional) Defines the user agent string, e.g., - 'user-agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0' + 'userAgent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0' * <b>accept</b> (optional) Defines the accept string, e.g., 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' + * parameters for the Command receiver + + * <b>command</b> + the command + * <b>contenttype</b> (optional; default: 'text') + content type, e.g., 'xml'/'html'/'text'. + * <b>enc</b> (optional; default: 'utf-8') + Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'. + + * parameters for the XPath parser: + + * <b>contentxpath</b> + XPath expression for the content sections to extract + * <b>titlexpath</b> (optional) + XPath expression for the title sections to extract + + * parameters for the CSS parser: + + * <b>contentcss</b> + CSS expression for the content sections to extract + * <b>titlecss</b> (optional) + CSS expression for the title sections to extract + + * parameters for the RegEx parser: + + * <b>contentregex</b> + Regular expression for content parsing + * <b>titleregex</b> (optional) + Regular expression for title parsing * We collect some XPath/CSS snippets at this place: <a href="https://github.com/Debianguru/MailWebsiteChanges/wiki/snippets">Snippet collection</a> - please feel free to add your own definitions! |