path: root/
diff options
authorJörg Frings-Fürst <>2014-09-19 08:38:15 +0200
committerJörg Frings-Fürst <>2014-09-19 08:38:15 +0200
commitb4d4867b851bb2f22bf18aa5f109bd2533d5736f (patch)
tree04b62d50d5e361902c945d29c95713f1a7655401 /
Initial import of mwc version 1.7.2-1
Diffstat (limited to '')
1 files changed, 125 insertions, 0 deletions
diff --git a/ b/
new file mode 100644
index 0000000..7419ad2
--- /dev/null
+++ b/
@@ -0,0 +1,125 @@
+# MailWebsiteChanges
+Python script to keep track of website changes; sends email notifications on updates and/or also provides an RSS feed
+To specify which parts of a website should be monitored, <b>XPath selectors</b> (e.g. "//h1"), <b>CSS selectors</b> (e.g. "h1"), <b>and regular expressions can be used</b> (just choose the tools you like!).
+MailWebsiteChanges is related to <a href="">PageMonitor</a> for Chrome and <a href="">AlertBox</a> / <a href="">Check4Change</a> for Firefox. However, instead of living in your web browser, you can run it independently from command line / bash and install it as a simple cron job running on your linux server.
+<i>This is Open Source -- so please contribute eagerly! ;-)</i>
+## Configuration
+Configuration can be done by creating a <code></code> file (just place this file in the program folder):
+Some examples:
+### Website definitions
+sites = [
+ {'shortname': 'mywebsite1',
+ 'uri': '',
+ 'contentcss': 'div'},
+ {'shortname': 'mywebsite2',
+ 'uri': '',
+ 'contentxpath': '//*[contains(concat(\' \', normalize-space(@class), \' \'), \' news-list-container \')]',
+ 'titlexpath': '//title'},
+ {'shortname': 'mywebsite3',
+ 'uri': '',
+ 'type': 'text',
+ 'contentregex': 'Version\"\:\d*\.\d*'}
+ * parameters:
+ * <b>shortname</b>
+ short name of the entry, used as an identifier when sending email notifications
+ * <b>uri</b>
+ URI of the website; If the scheme of the uri is 'cmd://', the string is interpreted as a command and the standard output (stdout) is parsed.
+ * <b>type</b> (optional; default: 'html')
+ content type, e.g., 'xml'/'html'/'text'.
+ * <b>contentxpath</b> / <b>titlexpath</b> (optional)
+ XPath expression for the content/title sections to extract. If you prefer, you could use contentcss/titlecss instead.
+ * <b>contentcss</b> / <b>titlecss</b> (optional)
+ CSS expression for the content/title sections to extract. This is ignored if there is a corresponding XPath definition.
+ * <b>contentregex</b> / <b>titleregex</b> (optional)
+ Regular expression. If XPath/CSS selector is defined, the regular expression is applied afterwards.
+ * <b>encoding</b> (optional; default: 'utf-8')
+ Character encoding of the website, e.g., 'utf-8' or 'iso-8859-1'.
+ * <b>receiver</b> (optional)
+ Overwrites global receiver specification.
+ * We collect some XPath/CSS snippets at this place: <a href="">Snippet collection</a> - please feel free to add your own definitions!
+ * The <b>--dry-run="shortname"</b> option might be useful in order to validate and fine-tune a definition.
+ * If you would like to keep the data stored in a different place than the working directory, you can include something like this:
+ <pre>
+ <code>
+ os.chdir('/path/to/data/directory')
+ </code>
+ </pre>
+### Mail settings
+enableMailNotifications = True #enable/disable notification messages; if set to False, only send error messages
+subjectPostfix = 'A website has been updated!'
+sender = ''
+smtphost = ''
+useTLS = True
+smtpport = 587
+smtpusername = sender
+smtppwd = 'mypassword'
+receiver = '' # set to '' to also disable notifications in case of errors (not recommended)
+### RSS Feeds
+If you prefer to use the RSS feature, you just have to specify the path of the feed file which should be generated by the script (e.g., rssfile = 'feed.xml') and then point your webserver to that file. You can also invoke the script which implements a very basic webserver.
+ <code>
+enableRSSFeed = True #enable/disable RSS feed
+rssfile = 'feed.xml'
+maxFeeds = 100
+ </code>
+### Program execution
+To setup a job that periodically runs the script, simply attach something like this to your /etc/crontab:
+ <code>
+0 8-22/2 * * * root /usr/bin/python3 /usr/bin/mwc
+ </code>
+This will run the script every two hours between 8am and 10pm.
+If you prefer invoking the script with an alternate configuration files, simply pass the name of the configuration file as an argument, e.g., for <code></code>, use <code>mwc --config=my_alternate_config</code>.
+## Requirements
+Requires Python 3, <a href="">lxml</a>, and <a href="">cssselect</a>.
+For <b>Ubuntu 12.04</b>, type:
+ * sudo apt-get install python3 python3-dev python3-setuptools libxml2 libxslt1.1 libxml2-dev libxslt1-dev python-libxml2 python-libxslt1
+ * sudo easy\_install3 pip
+ * sudo pip-3.2 install lxml cssselect
+For <b>Ubuntu 14.04</b>, type:
+ * sudo apt-get install python3-lxml python3-pip
+ * sudo pip3 install cssselect