diff options
author | Jörg Frings-Fürst <debian@jff-webhosting.net> | 2014-09-19 08:38:15 +0200 |
---|---|---|
committer | Jörg Frings-Fürst <debian@jff-webhosting.net> | 2014-09-19 08:38:15 +0200 |
commit | b4d4867b851bb2f22bf18aa5f109bd2533d5736f (patch) | |
tree | 04b62d50d5e361902c945d29c95713f1a7655401 /README.md |
Initial import of mwc version 1.7.2-1
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 125 |
1 files changed, 125 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..7419ad2 --- /dev/null +++ b/README.md @@ -0,0 +1,125 @@ +# MailWebsiteChanges + +Python script to keep track of website changes; sends email notifications on updates and/or also provides an RSS feed + +To specify which parts of a website should be monitored, <b>XPath selectors</b> (e.g. "//h1"), <b>CSS selectors</b> (e.g. "h1"), <b>and regular expressions can be used</b> (just choose the tools you like!). + +MailWebsiteChanges is related to <a href="http://code.google.com/p/pagemon-chrome-ext/">PageMonitor</a> for Chrome and <a href="https://addons.mozilla.org/de/firefox/addon/alertbox/">AlertBox</a> / <a href="https://addons.mozilla.org/de/firefox/addon/check4change/">Check4Change</a> for Firefox. However, instead of living in your web browser, you can run it independently from command line / bash and install it as a simple cron job running on your linux server. + + +<i>This is Open Source -- so please contribute eagerly! ;-)</i> + + +## Configuration +Configuration can be done by creating a <code>config.py</code> file (just place this file in the program folder): +Some examples: + +### Website definitions +<pre> +<code> +sites = [ + + {'shortname': 'mywebsite1', + 'uri': 'http://www.mywebsite1.com/info', + 'contentcss': 'div'}, + + {'shortname': 'mywebsite2', + 'uri': 'http://www.mywebsite2.com/info', + 'contentxpath': '//*[contains(concat(\' \', normalize-space(@class), \' \'), \' news-list-container \')]', + 'titlexpath': '//title'}, + + {'shortname': 'mywebsite3', + 'uri': 'http://www.mywebsite3.com/info', + 'type': 'text', + 'contentregex': 'Version\"\:\d*\.\d*'} + +] +</code> +</pre> + + * parameters: + + * <b>shortname</b> + short name of the entry, used as an identifier when sending email notifications + * <b>uri</b> + URI of the website; If the scheme of the uri is 'cmd://', the string is interpreted as a command and the standard output (stdout) is parsed. + * <b>type</b> (optional; default: 'html') + content type, e.g., 'xml'/'html'/'text'. + * <b>contentxpath</b> / <b>titlexpath</b> (optional) + XPath expression for the content/title sections to extract. If you prefer, you could use contentcss/titlecss instead. + * <b>contentcss</b> / <b>titlecss</b> (optional) + CSS expression for the content/title sections to extract. This is ignored if there is a corresponding XPath definition. + * <b>contentregex</b> / <b>titleregex</b> (optional) + Regular expression. If XPath/CSS selector is defined, the regular expression is applied afterwards. + * <b>encoding</b> (optional; default: 'utf-8') + Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'. + * <b>receiver</b> (optional) + Overwrites global receiver specification. + + + * We collect some XPath/CSS snippets at this place: <a href="https://github.com/Debianguru/MailWebsiteChanges/wiki/snippets">Snippet collection</a> - please feel free to add your own definitions! + + * The <b>--dry-run="shortname"</b> option might be useful in order to validate and fine-tune a definition. + + * If you would like to keep the data stored in a different place than the working directory, you can include something like this: + <pre> + <code> + os.chdir('/path/to/data/directory') + </code> + </pre> + +### Mail settings +<pre> +<code> +enableMailNotifications = True #enable/disable notification messages; if set to False, only send error messages +subjectPostfix = 'A website has been updated!' + +sender = 'me@mymail.com' +smtphost = 'mysmtpprovider.com' +useTLS = True +smtpport = 587 +smtpusername = sender +smtppwd = 'mypassword' +receiver = 'me2@mymail.com' # set to '' to also disable notifications in case of errors (not recommended) +</code> +</pre> + + +### RSS Feeds +If you prefer to use the RSS feature, you just have to specify the path of the feed file which should be generated by the script (e.g., rssfile = 'feed.xml') and then point your webserver to that file. You can also invoke the mwcfeedserver.py script which implements a very basic webserver. + +<pre> + <code> +enableRSSFeed = True #enable/disable RSS feed + +rssfile = 'feed.xml' +maxFeeds = 100 + </code> +</pre> + + +### Program execution +To setup a job that periodically runs the script, simply attach something like this to your /etc/crontab: +<pre> + <code> +0 8-22/2 * * * root /usr/bin/python3 /usr/bin/mwc + </code> +</pre> +This will run the script every two hours between 8am and 10pm. + +If you prefer invoking the script with an alternate configuration files, simply pass the name of the configuration file as an argument, e.g., for <code>my_alternate_config.py</code>, use <code>mwc --config=my_alternate_config</code>. + + +## Requirements +Requires Python 3, <a href="http://lxml.de/">lxml</a>, and <a href="http://pythonhosted.org/cssselect/">cssselect</a>. +For <b>Ubuntu 12.04</b>, type: + + * sudo apt-get install python3 python3-dev python3-setuptools libxml2 libxslt1.1 libxml2-dev libxslt1-dev python-libxml2 python-libxslt1 + * sudo easy\_install3 pip + * sudo pip-3.2 install lxml cssselect + +For <b>Ubuntu 14.04</b>, type: + + * sudo apt-get install python3-lxml python3-pip + * sudo pip3 install cssselect + |