Email Parsing Scripts and Silly Things to NOT do with an email resposne script

June 23rd, 2009 Syn No comments

Ok, I must start this post with a warning – this is a warning to anyone writing scripts which are triggered by emails.

Firstly, email is evil, and the world will be a better place without it. I know this is a personal opinion, but I stand by it.

Cpanel / whm users. You can point a mail address to a script by using the redirect method – just as you would send it to another email address, instead of typign the mail address you type a pipe char (“|”) followed by the exact path of the script you want to run:

For example:

|/home/syn/mailscript.php

The script should be written just like a standard linux console script, so you would start it with a hashbang #!/bin/php (or whatever). Also, your script needs to be executable by the user of mail (which means it also needs to be readable by that same user)

chmod 755 /home/syn/mailscript.php

To parse the email (presuiming you are using Exim or similar) then you would then read from standard in. For PHP programmers out there, standard input/output library page is a place to start. I personally jus used

$email = file_get_contents("php://stdin");

So, the next part is how to parse it. Emails come in exactly two parts – the headers, then the body. The body might be broken up into lots of little parts itself, but the headers is not. It is exactly one header per line until you reach a line which is empty. In PHP I like to explode the file into lines, and parse the two seperatly – I have a boolean flag set to tell me when I’ve reached the end of the headers.

Now, here is where I need to seperate you readers out. If you are planning to insert the data or trigger a function or do anything that ISNT forward the email on, then you have read enough, and there is nothing stopping you from implementing your own script.

However, if you plan on forwarding the email, it is very tempting to just shove all the headers back into $header (rememebring to add newlines back on), shove all the body back into $body and then in php, use its little mail function with:

mail($to, $subject, $body, $headers);

I can not stress how *BAD* an idea this is. The problem is, that you must strip the To: address from the headers (and indeed Deliver-To) addresses. IF you do not, you *will* crash your server.

Why?

The problem is with the optional header $headers – If you had decided not to include that, you wouldn’t have a problem, but as you did, you do. The $to and $subject parameters are just appended to the header field, giving you the correct fields. The email gets delivered, but also, because the old To: address is in there, it will also deliver a mail back to the original destination … which would be your script… again.

Congrats! A forkbomb. Before you know it, your entire server will run out of processes.

Also dont print… anything.

Dont print anything (or echo) as that goes to std-out. If exim recieves any thing, it returns that output, including your script information in a delivery failure email to the “from” address. I am of the school of script writing which the script should say “OK” when it finishes succesfully, but it really shouldn’t.

Categories: Tags:

PHP and Small Websites

May 27th, 2009 Syn No comments

It’s been a long time since I was last programming in PHP, but FEX asked me to do some coding for him the other day, and I’m suprised at how long it took me to “get back into the swing” of it. I had forgotten some of the problems I experience with PHP – firstly its lack of consistancy in its language declarations (the massive issue with its v3 and earlier style functions, and aparantly to do with the way the language was developed – some functions being studlyCaps and some being with_underscores).

Also, the amount of time spent writing “proper” website engines (that is with services behind them to do the processing of data) is really showing – doing inline parsing of XML documents in PHP really is … nasty. I feel quite dirty now. 

But Smarty is still amazing – if not more amazing then it was when I was using it four years or so. Smarty is the basis for the design of my C# Templator class – when I left PHP for C#, I needed something which had the power of Smarty, but was for .Net. What I wrote didn’t have the full functionality – that was always a “if I need it, I’ll write it”, but it does have a couple of extra, quite cool mods. 

Anyways, the problem is that PHP is really good if you want to make a really nice small website that has limited functionality but works, and shouldn’t require much in the  way of modifications or expansions to do much else. Is that a shame?

Categories: Tags:

Uncharted Territory – Mac, Windows and Linux Serial Ports and Services

May 15th, 2009 Syn No comments

So my Uncharted territory alarm is going off big-time. Cross platform code is nice and easy, if you don’t want to interface with hardware – if you only care about interfacing with a web server, or doing a calculation, then sure, why not have stuff cross platform.

C# and SerialPorts on the Mac are just a no-no. C#, Mono and Serial Ports in Windows and Linux (or Visual Studio in Windows) are good to go – there is some interesting differences in how Linux presents devices (it presents a list of possible devices which may, or more likley, don’t exist, windows presents only devices which definitely do exist and are connected). Don’t even get me started on my failed “use as a file” /dev/tty.usbserial experiment I performed.

Python on the other hand, has absolutely no problem with its cross platform serial support, however, Python does begin to have some interesting differences when it comes to Server Sockets and how to deal with them. 

From: http://www.amk.ca/python/howto/sockets/ 

On Unix, select works both with the sockets and files. Don’t try this on Windows. On Windows, select works with sockets only. Also note that in C, many of the more advanced socket options are done differently on Windows. In fact, on Windows I usually use threads (which work very, very well) with my sockets. Face it, if you want any kind of performance, your code will look very different on Windows than on Unix. (I haven’t the foggiest how you do this stuff on a Mac.)”

So now I’m left with some confusing options. 1) attempt to write a python program which works kind of the same way for both Windows, Mac’s and Linux. 2) Write a python program for the Mac’s and a C#/.Net/Mono program for the Windows and Linux or 3) think of something else.

Architecture of Communication Channel

The system at the moment is designed as follows:

hsmlayout

Here we can see that there are four main components – the first is the HSM, which is written in C and stored on the Arduino Boards. This is where the security keys are stored, and performs a lot of the cryptography to generate symetric keys with the HSM at the other end. 

The HSM Service is here to provide a locking tool – because the HSM is on a serial port, only one app can talk to it at a time. To get a semblance of locking and thread control, the HSM tools (and applications which wish to use the HSM) will have to talk to the HSM service. The connection between the HSM tools and the HSM service is currently out of scope

To create a connection between two applications (for example, a web browser and a web server), the following happens: Please be wary of the maths displayed – I suspect the stylesheet might be removing effectivness of <sup> tags 

  1. Both computers have ArcaHSMs plugged in and the HSM Service is running on both computers
  2. The user who wants to connect to the other computer issues “connect to” command or program. eg: 
    >HSMConnect --device /dev/tty01 --host secured.example.com --port 1234
  3. The HSM Server loads the HSM Hardware Id and then makes contact with the other server – the remote server can check if its allready performed a key exchange (Deleting a pair of keys is currently out of scope). If not, it performs a key exchange:
    1. The host begins the Diffie-Hellman exchange and picks the prime number (p) and the primitive root (g) (g will be 5)
    2. The Host HSM Chooses a secret integer a, then sends the Client A=(ga mod p)
    3. The Client HSM Chooses a secret integer b, then sends back to the Host B=(gb mod p) . The Client is also able to compute s = (A)b mod p and stores s in the HSM for the HardwareID of the Host.
    4. The Host HSM is now able to compute s = (B)a mod p and stores s in its HSM for the hardware ID of the Client.
  4. The Host then produces a 8 byte random data block and returns that data block to the connecting client.
  5. Both sides encrypt the random data block with the shared key to generate a transport key.
  6. The Transport key is now returned to the HSM Service on both sides. All data to and from the host is now encrypted using this key
  7. The Connecting service now informs the host which mode it is swapping into
    • Execute Program
    • Connect to Port

    In this instance, the connecting service sends a packet to connect it to port 1234 on the target machine. The Connecting service also opens a server socket on its local machine (Restricted to only accepting connections from localhost? Again, out of scope). If the system was connecting to a shell, instead the service would specify to run a shell program.

  8. The Host confirms this. All data passed now is piped into the relevant port
  9. The User then connects their webbrowser to http://localhost:1234

Not all services will be able to work over this method. It might be required for certain applications to use certain proxy modes for them to work.

Security Notes

Attacking the HSM with new Firmware

There will have to be two versions of this HSM. The first version will use the hardware serial port. This causes a [minor] security hole in that someone can upload a new firmware to the device and then start doing stuff with the EEPROM data. The solution to this is to have a version which is connected to a software serial port which is *not* wired into the upload pins. This means that the ISP pins are not activly wired into any computer – to reprogram, the attacker will need physical access to the device. 

I believe the same thing can be done though by programming through the normal hardware serial, then blowing the fuse bits which let you do ISP programming.

Key Exchange

The biggest weakness at the moment is key exchange. Because of constraints of processing on the Arduino board, I suspect that I am going to struggle to get a 1024 byte Key. From my last blog post (here) it can be seen that this effectivly reduces the entire system down to the equivlence of a 70-80 bit key. This is only true for the server-server cryptography and not for the “local” keys which use full 128 bit AES cryptography and have no public component.

Finally the last problem is that whilst I can predict random numbers with ease, having 1024 byte prime numbers lying around (and proving their prime) is really going to eat up my memory, so I think for the moment, this will have to resort to the HSM Service to generate and maintain that list. The HSM will have to (at the mooment) presume that the prime number given is indeed prime.

Categories: , , Tags:

hsmlayout

May 15th, 2009 Syn No comments
Categories: Tags:

Symetric Key sizes

May 14th, 2009 Syn No comments

“Writing Secure Code” [1] contains a table which is derived from the Internet Draft published by the IETF [2] – the url given is actually no longer valid, but probably because it is no longer a draft but a published RFC at 

Symmetric Key Size (bits) Equivlilant RSA Modular size (bits) Equivalent DSA Subgroup Size (bits)
70 947 128
80 1228 145
90 1553 153
100 1926 184
150 4575 279
200 8719 373
250 14596 475

What this means is that comparitivly, the 128 bit AES based Arduino System is the equivalent of a 2000 bit RSA system. On the one side, this means that the security it will provide is relatively good (for various values of relative) for communicating between systems, and on the other, it means to maintain this level of security, the key exchange will have to use the PKI key size of that same length.

I also found (and then lost) a reference to the idea that Diffie-Hellman key security lengths should be considered the same length as RSA. So this allows us to put a figure on the entire security for the key exchange as well as the other end. In other news, I’ve “acquired” an account on a Mac with a Arduino plugged in, so I’m working on that as well to make sure that the system is cross platform compliant.

[1] Writing Secure Code, Second edition, Howard and Lipner, Microsoft Press, 2003.  0-7356-1722-8 Chapter 8, “Cryptographic Foibles” pp 275
[2] “Determining Strengths For Public Keys Used For Exchanging Symetric Keys”, IETF, http://tools.ietf.org/html/rfc3766

Categories: Tags:

Diffie-Hellman-Merkle Key Exchange and Arduino

May 7th, 2009 Syn No comments

Wow I get the greatest titles to these posts, and yet again, its about encryption on the Arduino Board. The HSM design for stand alone encryption and decryption is going well – I have a framework which can perform 3 * 16 byte AES operations per second – oooh 30 bps throughput!

The system is not designed to do *lots* of cryptography – instead it is designed to be able to generate “transport” or “storage” keys for the end user. I have got 512 bytes of data, and I have decided to split the data into two areas – allow 1 * 16 byte area for storing data about the HSM, then allow 15*16 block keys for local encryption, and then allow in the final 256 bytes, 10 * 16+8 byte blocks for the storage of paired keys with other HSM’s. This allows the ArcaHSM to communicate with just 10 similar HSM units.

But. There is a problem. I’m having troubles finding something which will allow me to do [secure] key exchange in the memory that the Arduino has availible. So I have a solution – I use progmem to use up 4 k of program space to store the variables I need to use for key exchanges. This has two problems

  1. I use up 4 k of program space pernamently – I mean forget completly the angst I had yesterday over using up 200 bytes more for a 25% speed boost. Seriously, what was I thinking? Who needs LCD displays?
  2. I can only do one key exchange at a time, i.e. I must key exchange completly before beginning a new key exchange.
  3. This will be re-writing the flash – this will mean I can only do 100000 key exchanges before it fails (the same can not be said for the keys – because there are 15 keys, I can do 15 * 100000 key changes before the thing dies).

Remember though – you use the keys to generate keys, so you don’t need to change keys very often. Like never is probably ideal.

I also have a second problem. I’ve put down my arduino board with LCD shield somewhere and AND running prototype version of the HSM software, and erm I don’t know where. Doh!

Categories: Tags:

AES on Arduino

May 6th, 2009 Syn No comments

Ilya at LiterateCode.com is really hot on comming back on peoples comments (see here). I am impressed by the speed that they came back – obviously they have good source code managment (something that I suck at for personal projects).

Anyways, their new code is actually … better. I don’t mean that the update he said (adding in the tables version) is better, I mean the non-tabled version is better as well. I’m not a C expert (or indeed a crypto guru) so I’m having a hard time putting my finger on why it is so much better from the diffs. 

Here’s some hard evidence please note size includes my test harness and data I’m encrypting etc, not the size of the compiled library. Edit: times given are in MS as timed by difference in response to millis()

Lib Size 1 2 3 4 5 6 7 8 9 10 Mean
Original Library 4636 5297 5297 5297 5297 5296 5296 5296 5296 5296 5296 5296.4
New Library 4680 416 416 416 415 416 416 416 416 415 416 415.8
New Library (Back to Tables) 5092 310 309 309 309 309 309 309 309 309 310 309.8

This was all run on an Atmel ATMega 128 using Arduino. I did not modify the library to use PROGMEM to store the tables in the program memory but instead they are as is in the lib from Ilya, which means they would be loaded into RAM iirc(?)

The original problem was that the library (whilst working) took 5 seconds per crypto operation. This library now takes under 1/10th that time. Is the benefit from dropping to tables (a 25% time benefit) worth the 300 bytes (and who knows what % of my 1k memory) of program size? In embedded, its possible that the 300 bytes library is the difference between having an LCD module and not.

(For reference, on the Atmel ATMega 168, 300 bytes is aprox 2.14% of the code space)

Categories: Tags:

Arduino HSM and Random Number Gen Update

May 5th, 2009 Syn No comments

This is now in three parts- Firstly the Random Number Generator. Building this weekend was a semi-success. The cost of building one is in the minority – the biggest cost is the magnets (I used super strong ones ok?) and then you have the basics of the random number generator.

Now heres where things began to go wrong – in my mind I imagined a magnet and [two] coil[s] which when the magnet moves about, would induce a current in the coil which you detect on the analog input pins of the Arduino. What I forgot was how fast you have to move the magnet to actually make a voltage high enough for it detectable. Doh!

I now have two options.

  1. build what is called a magnetometer which can be used to detect the changes in the magnetic field to a much better degree. or
  2. use a different mechanism for detecting the movement of the pendulum.

Also yet to be built is the “boost” function of the generator which will prevent it  getting “stuck”. But heres a picture of where it is at the moment:

imga0004

A close up of the top half (with the coil mocked up as its been removed now) the magnet on the top moves in and out of the coil to generate a voltage. This is probably going to change in the short term to a sharp distance measurer, or two, to be able to work out the top half’s position.

imga0003

The above image shows detail of the bottom part – the magnet on the bottom is set so that the magnets on the floor repel – this minimises locking due to magnetism. Here you can see I’ve used rare-earth magnets for a bigger effect.

imga0005

This final image gives you an idea of size.

Please ignore the choice of books – the Complete Reference C is one I picked up cheap, and is no way an endorsement of it (however Code Complete is awesome and I would recommend reading/buying/memorising).

And moving onto the HSM side. The HSM itself uses a AES library from literatecode.com which allows me to peform AES encryption and decryption on the Arduino board. Some (and I must stress, only some) of the HSM interface has been written, only to prove the the idea behind some of the work…. but erm… well lets have a look:

Pre 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 10 21 32 43 54 65 76 87 98 A9 BA CB DC ED FE F
Int C9 76 27 4D BA 02 FB 5D C5 58 78 E4 48 C3 9B 8C 10 21 32 43 54 65 76 87 98 A9 BA CB DC ED FE F
Pos 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 10 21 32 43 54 65 76 87 98 A9 BA CB DC ED FE F

So what is this? Well from looking at it, it shows that the PRE line is the data I’m encrypting, the  INT line is the line which is the data as it is encrypted, and the POS line is the post decryption, showing its the same as the first. The only problem is the library is only encrypting the first block of 16 bytes, so I need to look into a bit more care of what is going on there, and whether I need to support block modes etc.

What is needed now is a better design of what I am expecting from the HSM, and what interfaces are needed.

Part Three: The visuals. I’ve started work on what visuals I’m going to need, and started work – I’m very wary that the random number generator AND encryption library space will take up a “sizable” amount of memory. If I want to include the flash memory stick stuff (for storing more keys in) I may have to consider ditching the LCD display from the design just because of memory space in the flash (or consider either upgrading to a bigger chip and / or using two arduinos, and having a dedicated control module.

Categories: , Tags:

Arduino HSM

April 30th, 2009 Syn No comments

A while ago I embarked on a mission to make a “mini hardware security module” (HSM) using an Arduino board, and I had decided upon using DES / 3DES as the basis for the algorithm. Unfortunatly, I had lots of difficulties in porting various existing libraries in. My mission was to use an existing library (like OpenSSL or matrixssl (http://matrixssl.org/) but I never managed succesfully. I didn’t want to write my own DES library for the fear of the complexities of DES compared with the amount of time I had to develop for it.

At the time, I wrote:

… some googling for “3des openssh” has found that other people have also been looking for embedded libraries – or libraries for embedded devices. It also looks like that AES might be a smaller function then the 3DES one. That surprised me.

More research, after a discussion with a work-collegue means that looking into Arduino as a cryptography Dongle type application also came out with the conclusion that AES is the better way to go. At the end of the day, if you’re just generating 3DES keys for your financial application, you might as well generate them from internal AES crypto. 

Very Quickly I found this: http://www.hoozi.com/Articles/AESEncryption.htm and to quote it:

The main feature of this AES implementation is not efficiency; It is simplicity and readability.

And the page contains two files for encryption and decryption in C/C++ so that should be able to port to the Arduino either verbatim or with very minor changes. We shall see!

Anyways, the design of the HSM at the moment is as follows: 

tinyhsm

From that you should be able to see I’m going for a very basic loadout of a lcd shield (from nuelectronics) which will allow me to show status messages and input via a joystick on the shield, and the actual arduino as well. I am toying with adding the USB flash stick for large file storage, but at the moment I’m not sure about that (although it would be secure because the data in the file would be encrypted with a key stored in the HSM itself).

Of course this HSM would be vunerable to being edited via the ISP port – so the physical security of it will be of the most importance if it is to be safe.

Categories: , , Tags:

Human Replacement with a Small Shell Script

April 29th, 2009 Syn No comments

I have long ago threatened to replace you with a small shell script. That time has come. This isn’t a discussion on AI, but instead a few simple ways to generate pseudo random English. The initial part of this project is to document a relatively old trick to replace someone with a script. The idea is to generate a function, and port that function to various languages to show how easy it is to produce a bit of text in the language that someone else uses.

It should be noted that the function does not generate NEW text, it merely uses text that a person has already used. For best results, you need a lot of text written by one person – mixing and matching people and indeed style will produce a very confused response.

Method

The system works by using a corpus written by the target, a “length” n and a starting phrase, which is at least as long as the length.

The output has to be buffered slightly – you start with the starting phrase, and you take the last n characters in the phrase (including punctuation) and search for all the occasions that sequence occurs in the corpus, and takes the next character that appears, and adds it to an array.

Once all the occasions have been found, if the array is bigger then 0, then a random letter is picked from it – this is where the corpus actually plays its part. If a user uses particular sequences of letters more then others, then the probability is affected, i.e. there may be multiple instances of some letters in the array. (if the length is 3 and the last three letters are ” th” then the likley hood is that the array will contain {e, e, e, a} so the system is likley to pick an e.

You now append that letter to your output, and go again, still only using the last n letters. For example:

This is a sample corpus that the test can write and think about without the test failing in its demonstration of good use. Lets see what happens when the test gets given the corpus with a length of just two and a starting word of “th” and whether it will work.

 

This is a sample corpus that the test can write and think about without the test failing in its demonstration of good use. Lets see what happens when the test gets given the corpus with a length of just two and a starting word of “th” and whether it will work.

Select from: { a, e, o, e, e, “ } Selects an “e”,

Phrase now: “the” last 2 letters he, search:

This is a sample corpus that the test can write and think about without the test failing in its demonstration of good use. Lets see what happens when the test gets given the corpus with a length of just two and a starting word of “th” and whether it will work.

Select from: { _, _, e } Selects an “_” (space)

Phrase now: “the ” last 2 letters “e ”, search:

This is a sample corpus that the test can write and think about without the test failing in its demonstration of good use. Lets see what happens when the test gets given the corpus with a length of just two and a starting word of “th” and whether it will work.

Select from: { t, t, w, c } Selects an “w”

Phrase now: “the w” last 2 letters “ w”, search:

This is a sample corpus that the test can write and think about without the test failing in its demonstration of good use. Lets see what happens when the test gets given the corpus with a length of just two and a starting word of “th” and whether it will work.

Select from: { r, i, h, h, i , o, o } Selects an “o”

Phrase now: “the wo” last 2 letters “wo”, and continue…

 
Note, the array can only be empty in one of two circumstances – 1) the starting phrase does not occur, and 2) the only instance of the phrase is the last n letters of the corpus. Also note that the array of letters can either use or ignore case, and also punctiation. For more realism, you need punctiation.

Now. The big question you’re all asking is “does this really work”, well the script to find out is at the end of this post, In this I have adapted the formula slightly. The system takes the phrase compiled in, and it uses new line chars to mark the end of a sentence. Here’s where some problems start coming up – the length of the corpus is preferably long, with a lot of variety in phrases and so on. Also the length of search pattern is important – the longer the length, the more precise and less flexible, but the shorter the more random. Above I used a lenght of two – and this is very unsuitable because there is no connection between words – if a space is introduced, you only base the starting letter of the next work on the last letter of the previous. Certainly lots of enhancements to this algorithm can be written.

But does it WORK I hear you cry. Well what I did (thinking I would be very smart and clever) is take all the post written to the end of the next sentence, and run it through the system with a starting phrase of “But” and a length of 3. When it hits a new line, it will end, and I will post the entire thing verbatim as it comes out of the system here (and be warned, because I repeat that test sentence several times that will skew the results) but here we go!

But’ and slightly. (if a functuatime long phrase, bufferelation an eith of just 2 letterson AI, but write array base corpus wheith a space you take test the letten the no of sequence of 3. When by old takes is part. The is very uses this test characterson of 3. Well work.

If we take the same results, and increase its length to 4 (and ofcourse, changing starting phrase to “But ” then we get the following:

 

But does new line, it with a length, the come. The initial parting <strong>out two and whether is probability instead a script to show how how how easy it to pick to picked from it to be buffered style warned, you takes text sequence occasion of “the test can <strong> w</strong>e </strong>ork.

Note that the length of 4 shows a much more “english” set of words. The random snippets of HTML in there come from the actual design. If we increase to 5, 6 and 7 as in the following three paragraphs: 
But does not occurs in the lenght of the array. (if the array can be written.

But does not occur, and go again, still only instance of this is a sample corpus that sequence occur, and I will skew the results) but here we go!

But does it WORK I hear you cry. Well what I did (thinking I would be very smart and clever) is take <b>all</b> the post written to the end of this project is to generate a function does not occur, and 2) the only instance of the phrase is the length of 3. When it hits a new line chars to mark the entire thing verbatim as it comes out of this post, In this I have long as the languages to show how easy it is to document a relatively old trick to replace someone with a length is 3 and this is where the corpus with a length of search:
Hopefully now you will see the problem: the length of 7 produces words which are the most clear, and in sentences which are very clear english, but they are snatches verbatim from the first half of the post. With longer sequences, the average number of availible options decreases – there has to be a trade between number of choices availible to the script, and readbility of the output it produces.
And now for the source code:

 

        static Random r = new Random();
        static void Main(string[] args) {

            String corpus = @"A Long String in here with newlines"; // either \r or \n will cause the paragraph to ocme to an end.
            String startup = "A Lon"; // this is a phrase of length "length" and must appear in the corpus.

            int length = 5;

            Console.Write(startup);
            bool alive =true;

            string start = startup.Substring(startup.Length - length);

            while (alive) {
                try {
                    char next = lookCorup(corpus, start);
                    if (next == '\n' ) {
                        alive = false;
                    }
                    if (next == '\r') {
                        alive = false;
                    }
                    start = start.Substring(1) + next;
                    Console.Write(next);
                } catch (Exception) {
                    alive = false;
                }
            }

            Console.ReadKey();
        }

        private static char lookCorup(string Corpus, string start) {
            List next = new List();
            int last = Corpus.IndexOf(start, 0);

            while (last != -1) {
                next.Add(Corpus[last + start.Length]);

                last = Corpus.IndexOf(start, last+1);
            }

            if (next.Count > 0) {
                return next[r.Next(0, next.Count)];
            } else {
                throw new Exception("End of Line");
            }
        }
Categories: , Tags: