exim-blue-ldFollowing on from last week’s w00tw00t block, here’s another quick fail2ban rule, this time to handle some Exim DOS/mail bombing problems.

I already use Fail2Ban to block unauthorised users who attempt to use my mail server as a relay to send spam, and this works very well. Recently, I’ve started seeing messages like this one start to appear in my exim logs:

2013-05-21 01:01:52 Connection from [2.38.90.63] refused: too many connections: 1 Time(s)
2013-05-21 01:01:53 Connection from [2.38.90.63] refused: too many connections: 1 Time(s)
2013-05-21 01:01:58 Connection from [2.38.90.63] refused: too many connections: 2 Time(s)
2013-05-21 01:01:59 Connection from [2.38.90.63] refused: too many connections: 1 Time(s)
2013-05-21 01:02:00 Connection from [2.38.90.63] refused: too many connections: 1 Time(s)
2013-05-21 01:02:11 Connection from [2.38.90.63] refused: too many connections: 1 Time(s)

In each case, the IP address originates from somewhere I’d not expect to receive email from, so it looks like some spammers are trying to mail bomb/DOS me.

In jail.local

[exim-dos]
enabled = true
filter = exim-dos
port = all
logpath = /var/log/exim*/mainlog
maxretry = 1
bantime = 3600

In filter.d/exim-dos.conf

# Fail2Ban Exim DOS configuration file.
# Checks for DOS/Flooding attempts.
#
# Author: Marcus Povey
#

[Definition]

# Option: failregex
# Notes.: regex to match the password failures messages in the logfile. The
# host must be matched by a group named "host". The tag "" can
# be used for standard IP/hostname matching and is only an alias for
# (?:::f{4,6}:)?(?P[\w\-.^_]+)
# Values: TEXT
#
failregex = \[\] .*refused: too many connections

# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =

Some potential gotchas

You may notice that I’ve set the bantime to quite a low value, this is because this rule has the potential of some false positives or collateral damage in certain situations.

Most likely you’ll get the too many connections error when some naughty fellow starts mailbombing you, but sometimes connections will be refused for legitimate users while an attack is in progress, which would result in the good guys being banned as well as the bad.

Setting bantime to something relatively short (one hour in my example) should limit fallout, since legitimate email servers will retry later, while most script kiddies will have moved on.

» Visit the project on Github…

Sometime in the next couple of weeks I will be performing a major software upgrade on the server that hosts this blog, as well as a number of services I host on behalf of my clients.

What this means to you

Hopefully nothing.

All being well, there should be no significant downtime and the services hosted by this server will continue uninterrupted.

If you are a client of mine, I will be contacting you directly in the next few days with more details about when the upgrade will be performed and how it might affect you.

I apologise in advance for any possible inconvenience this may cause.

NoSQL is the name given to a collection of newer database storage systems, which, among other things, don’t require a database schema to be defined ahead of time. They have become increasingly popular in recent years, and a large part of the reason is that they offer a number of significant scalability advantages over traditional relational database systems, especially when deployed in modern distributed web architectures.

When Elgg was coded, all those years ago, the standard web application environment was LAMP, where the M of course meant MySQL. This was fine for the time, but things have moved on, and I have been getting an increasing number of queries from people asking me how they might go about migrating Elgg over to NoSQL, so I thought it’d be worth writing up some of my thoughts on the subject.

I caveat all of this heavily by saying that, whatever you do, migrating Elgg over to NoSQL is going to be a big job, and additionally I’ve not actually tried to do it (and I’m not likely to, unless someone persuades me). However, the following should give you a place to start…

The Object Model

The good news is that Elgg’s object model, together with it’s key -> value metadata system, is actually pretty well suited to NoSQL. Additionally, the fact that every entity in Elgg has a globally unique identifier, which can canonically identify an object, means that you should run into fewer issues when you come to scale.

Obtaining this guid (and in fact any identifier – metadata ids, annotation ids etc etc) presents you with your first major issue.

Currently, Elgg uses the MySQL’s auto_increment value in the table. This was simple, and writing a table and receiving the ID is an atomic operation, meaning you don’t have to lock the table or do any other fancy stuff to ensure that the ID you receive is the correct ID for the record you’ve just written. It does however introduce a limit in how much you can scale out, since you always must have one canonical write database in order to get IDs that are unique globally throughout the system.

Were I to write Elgg today, I would not have done it this way.

A starting point to addressing this issue would be to look at using something like Twitter Snowflake. Snowflake is a server process which returns algorithmically generated identifiers which are unique, and incremental over time. How important this is in practice is up for debate, since most native operations base sort on a separate time_created field.

One assumption that is made quite widely throughout Elgg (and also a fare few plugins), however, is that GUIDs are integer values. There’s no getting around that this going to cause a fair amount of pain.

Objects and Functions

Once the data model has been migrated over to NoSQL, you’re going to have to modify the Elgg core database retrieval functions.

For the really low level get_entity() method, and similar functions which return individual records, this should be fairly straightforward. For the more involved get_entities*, you’re going to have to get a little bit more creative, especially since 1.8, Elgg allows you to specify custom JOIN and WHERE clauses, so these are going to have to be remapped.

It is possible that there are some libraries or DB front end layers available to simplify this process significantly, but I’m not currently aware of any.

Plugins

Migration of plugins is going to either be really easy, or really hard, depending on how they’ve been written. If they are using core Elgg function and are not making too many assumptions, you should in theory be able to virtually drop them in and hit go (after maybe changing any occurrence of the plugin casting GUIDs to an integer, if you’re using Snowflake).

Plugins which make their own DB queries (there shouldn’t be any, but those that are around are fewer in number) will obviously cause you a bit of a headache.

Anyway, those are my first thoughts on the matter. I’d be interested to hear from anybody who’s tried this!