summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md89
1 files changed, 58 insertions, 31 deletions
diff --git a/README.md b/README.md
index 8e78da6..69718e7 100644
--- a/README.md
+++ b/README.md
@@ -19,20 +19,23 @@ Some examples:
<code>
sites = [
- {'shortname': 'mywebsite1',
- 'uri': 'http://www.mywebsite1.com/info',
- 'contentcss': 'div'},
-
- {'shortname': 'mywebsite2',
- 'uri': 'http://www.mywebsite2.com/info',
- 'contentxpath': '//*[contains(concat(\' \', normalize-space(@class), \' \'), \' news-list-container \')]',
- 'titlexpath': '//title'},
-
- {'shortname': 'mywebsite3',
- 'uri': 'http://www.mywebsite3.com/info',
- 'type': 'text',
- 'contentregex': 'Version\"\:\d*\.\d*',
- 'user-agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0'}
+ {'name': 'example-css',
+ 'parsers': [uri(uri='https://github.com/mtill', contenttype='html'),
+ css(contentcss='div')
+ ]
+ },
+
+ {'name': 'example-xpath',
+ 'parsers': [uri(uri='https://example-webpage.com/test', contenttype='html'),
+ xpath(contentxpath='//div[contains(concat(\' \', normalize-space(@class), \' \'), \' package-version-header \')]')
+ ]
+ },
+
+ {'name': 'my-script',
+ 'parsers': [command(command='/home/user/script.sh', contenttype='text'),
+ regex(contentregex='^.*$')
+ ]
+ }
]
</code>
@@ -40,31 +43,55 @@ sites = [
* parameters:
- * <b>shortname</b>
- short name of the entry, used as an identifier when sending email notifications
+ * <b>name</b>
+ name of the entry, used as an identifier when sending email notifications
+ * <b>receiver</b> (optional)
+ Overrides global receiver specification.
+
+ * parameters for the URL receiver:
+
* <b>uri</b>
- URI of the website; If the scheme of the uri is 'cmd://', the string is interpreted as a command and the standard output (stdout) is parsed.
- * <b>type</b> (optional; default: 'html')
+ URI of the website
+ * <b>contenttype</b> (optional; default: 'html')
content type, e.g., 'xml'/'html'/'text'.
- * <b>contentxpath</b> / <b>titlexpath</b> (optional)
- XPath expression for the content/title sections to extract. If you prefer, you could use contentcss/titlecss instead.
- * <b>contentcss</b> / <b>titlecss</b> (optional)
- CSS expression for the content/title sections to extract. This is ignored if there is a corresponding XPath definition.
- * <b>contentregex</b> / <b>titleregex</b> (optional)
- Regular expression. If XPath/CSS selector is defined, the regular expression is applied afterwards.
- * <b>encoding</b> (optional; default: 'utf-8')
+ * <b>enc</b> (optional; default: 'utf-8')
Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'.
- * <b>splitregex</b> (optional)
- only works if type is set to 'text'; defines that content should be split to chunks based on the defined regex expression.
- * <b>receiver</b> (optional)
- Overrides global receiver specification.
- * <b>user-agent</b> (optional)
+ * <b>userAgent</b> (optional)
Defines the user agent string, e.g.,
- 'user-agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0'
+ 'userAgent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0'
* <b>accept</b> (optional)
Defines the accept string, e.g.,
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
+ * parameters for the Command receiver
+
+ * <b>command</b>
+ the command
+ * <b>contenttype</b> (optional; default: 'text')
+ content type, e.g., 'xml'/'html'/'text'.
+ * <b>enc</b> (optional; default: 'utf-8')
+ Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'.
+
+ * parameters for the XPath parser:
+
+ * <b>contentxpath</b>
+ XPath expression for the content sections to extract
+ * <b>titlexpath</b> (optional)
+ XPath expression for the title sections to extract
+
+ * parameters for the CSS parser:
+
+ * <b>contentcss</b>
+ CSS expression for the content sections to extract
+ * <b>titlecss</b> (optional)
+ CSS expression for the title sections to extract
+
+ * parameters for the RegEx parser:
+
+ * <b>contentregex</b>
+ Regular expression for content parsing
+ * <b>titleregex</b> (optional)
+ Regular expression for title parsing
* We collect some XPath/CSS snippets at this place: <a href="https://github.com/Debianguru/MailWebsiteChanges/wiki/snippets">Snippet collection</a> - please feel free to add your own definitions!