Simple Elliptic Curve Cryptography in Python compatible with seccure

Instead of RSA, you can use an Elliptic Curve algorithm for public/private key cryptography. The main advantage is that keys are a lot smaller. With RSA you need keyservers to distribute public keys. With Elliptic Curves, you can just write: my public key is *jMVCU^[QC&q*v_8C1ZAFBAgD.

There are two drawbacks: first, Elliptic Curve cryptography is even harder to understand than plain RSA and secondly, there are only a few implementation of Elliptic Curve cryptography. In fact: I did not find any maintained Elliptic Curve package for Python.

Thus I wrote a Python package compatible with the excellent commandline utility seccure written by Poettering. Here are some examples of how to use the original commandline seccure and how to do the same thing in Python.

For a private key, you just pick a (long!) password. You can derive the public key with seccure as follows:

$ seccure-key
Assuming curve p160.
Enter private key: my private key
The public key is: 8W;>i^H0qi|J&$coR5MFpR*Vn

In Python

>>> import seccure
>>> str(seccure.passphrase_to_pubkey(b'my private key'))
'8W;>i^H0qi|J&$coR5MFpR*Vn'

Now, to encrypt a message for the public key:

$ seccure-encrypt -o private.msg '8W;>i^H0qi|J&$coR5MFpR*Vn'
Assuming MAC length of 80 bits.
Go ahead and type your message ...
This is a very secret message!
^D

In Python:

>>> ciphertext = seccure.encrypt(b'This is a very secret message\n', b'8W;>i^H0qi|J&$coR5MFpR*Vn')
>>> ciphertext
'\x00\x146\x17\xe9\xc1\x1a\x7fkX\xec\xa0n,h

To decrypt the encrypted message:

$ seccure-decrypt -i private.msg
Assuming MAC length of 80 bits.
Assuming curve p160.
Enter private key: my private key
This is a very secret message!
Integrity check successful, message unforged!

In Python:

>>> seccure.decrypt(ciphertext, b'my private key')
'This is a very secret message\n'

To create a signature

$ seccure-sign
Assuming curve p160.
Enter private key: my private key
Go ahead and type your message ...
This message will be signed
^D

Signature: $HPI?t(I*1vAYsl$|%21WXND=6Br*[>k(OR9B!GOwHqL0s+3Uq

In Python:

>>> seccure.sign(b'This message will be signed\n', b'my private key')
'$HPI?t(I*1vAYsl$|%21WXND=6Br*[>k(OR9B!GOwHqL0s+3Uq'

And to verify a signature:

$ seccure-verify '8W;>i^H0qi|J&$coR5MFpR*Vn' '$HPI?t(I*1vAYsl$|%21WXND=6Br*[>k(OR9B!GOwHqL0s+3Uq'
Go ahead and type your message ...
This message will be signed
^D

Signature successfully verified!

In Python:

>>> seccure.verify(b'This message will be signed\n', b'$HPI?t(I*1vAYsl$|%21WXND=6Br*[>k(OR9B!GOwHqL0s+3Uq', b'8W;>i^H0qi|J&$coR5MFpR*Vn')
True

You can find the Python library on Github.

Update: added support for Python 2.6, 2.7, 3.2 and 3.3.

msgpack for pypy

msgpack is a fast and small binary format for json. The Python msgpack module is even a drop-in replacement for the json module.

PyPy is an implementation of Python in Python with a tracing JIT. If your Python script runs for more than 1 second, it will probably be quite faster on PyPy.

PyPy is almost a drop-in replacement for Python. Only extensions modules written in C can be a problem — for instance the msgpack module.

Thus I wrote a pure Python fallback for the msgpack module, which has been merged upstream. As a rough benchmark, I measured the pack and unpack time of a 30MB msgpack-object from the wild.

Note that:

  • The execution times for the fallback with normal Python are all off the chart (15.4s, 15.1s, 10.1s, 10.0s).
  • PyPy is faster on the second run: it had time to trace and optimize at runtime.
  • For packing, PyPy (nightly) is almost as fast as the original C extension.
  • For unpacking, PyPy (nightly) is 340% slower. Yet it more than 10 times faster than normal Python.

This code should be in the next official release of msgpack-python. If you want to use it already, check out the git repository.

The first version did not run this fast on PyPy. With PyPy’s jitviewer the compiled code and assumptions of PyPy can be examined. These are the tweaks I used in descending order of impact.

  • Instead of StringIO, the fallback uses PyPy’s own StringBuilder. StringIO allows writing, reading, seeking and what more. StringBuilder only allows appending and thus it is easier for PyPy to optimize. This increased performance of packing by an order of magnitude.
  • Using constant format strings for struct.unpack allows PyPy to optimize it. Thus struct.unpack("I", f.read(4)); f.read(n) instead of struct.unpack("I%ds"%n, f.read(4+n))
  • Using stream.write(a); stream.write(b) instead of stream.write(a+b) increased performance.
  • Adding an explicit fastpath
  • PyPy usually specializes a function well for its most common path, however in some cases it needs help. In this case a function returns a concatenation of several strings. However: in the most common case it only does one: ret=''; ret+=something; return ret. PyPy does not recognize that ret is not needed at all in this case, so I added an if-statement before initializing it for the case where there is only one concatenation.

Clearly the unpacking code could be faster, if anyone with expertise on PyPy’s JIT would look at it, that would be great.

Fix excessive disk usage of Sparrow for Mac

It may happen that Sparrow for Mac uses an excessive amount of disk space. For only the headers of my 9GB of e-mail, Sparrow used 39GB. I found a work-around.

The problem
Sparrow uses a Tokyo Cabinet, called data.tch, as storage. Mine was 39GB. What could eat up all that space? I only have 9GB of e-mail in total. With a simple script I looked into the cabinet and found it actually contained less than 1GB of data. The cabinet was wasting a lot of space by not being packed tightly.

The work-around
To repack the Tokyo Cabinet, run:

tchmgr optimize "~/Library/Containers/com.sparrowmailapp.sparrow/Data/Library/Application Support/Sparrow/(your-email@gmail.com).sparrowdb/data.db/data.tch"

Replace (your-email@gmail.com) with your e-mail address. You can install the tchmgr program using sudo port install tokyocabinet, if you are a MacPorts user.

This reduced the size of the cabinet to 1.1GB!

Cause?
It is odd that the cabinet became so loosely packed. It may happen if Sparrow adds and removes a lot of entries, but there seems no reason for Sparrow to do this. There might be an underlying bug. I reported this issue to the developers, but they have yet to respond.

Bertha: no-nonsense blob storage

For a project I need to store blobs of data (~10MB) and access them over TCP. I don’t need any features like removing, updating, authentication, statistics or replication: it’s simply not required or already handled by some other part of the project.

I only want to be able to store a blob and receive a key for it; retrieve a blob by its key and list all keys of the blobs stored.

I couldn’t find anything that fit it. Thus I created bertha. Lets delve right into it, shall we:

To run the server:

./berthad-vfs 0.0.0.0 1234 tmp data

There’s a python client

>>> from bertha import BerthaClient
>>> c = BerthaClient('serf', 1234)
>>> list(c.list())
[]
>>> key = c.put_str('Hello world')
>>> key
'64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c'
>>> c.get(key).read()
'Hello world'
>>> list(c.list())
['64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c']
>>> ctx = c.put()
>>> ctx.f.write("Do some")
>>> ctx.f.write("Streaming")
>>> ctx.finish()
'975001fb9bdc0f72a78ca6326c55af86348d4c84da7ba47b7ed785a03f6803b0'
>>> c.get('975001fb9bdc0f72a78ca6326c55af86348d4c84da7ba47b7ed785a03f6803b0').read()
'Do someStreaming'

Which also install a bertha commandline tool:

$ bertha list
975001fb9bdc0f72a78ca6326c55af86348d4c84da7ba47b7ed785a03f6803b0
64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c
$ bertha get 975001fb9bdc0f72a78ca6326c55af86348d4c84da7ba47b7ed785a03f6803b0
Do someStreaming
$ echo Hi | bertha put
c01a4cfa25cb895cdd0bb25181ba9c1622e93895a6de6f533a7299f70d6b0cfb
$ bertha get c01a4cfa25cb895cdd0bb25181ba9c1622e93895a6de6f533a7299f70d6b0cfb tmp
$ cat tmp
Hi

The berthad code is pretty small: at the moment under a thousand lines of C with lots of comments. GETs are pretty fast: berthad uses Linux’ splice syscall, which usually makes the network card read directly from the buffer the harddisk wrote to.

Thoughts on Flash

I have some remarks on Steve Jobs’s “Thoughts on Flash”.

  1. “Adobe claims that we are a closed system, and that Flash is open, but in fact the opposite is true.”. I agree with Steve: Flash is pretty closed. However, the iP(hone/od/ad) isn’t open either.
    1. H.264, which Jobs touts as a great modern replacement of flash, is patented. You have to pay whatever the MPEG LA fancies you to pay for use of the standard. This is not really different from the control of Adobe over Flash.
    2. iPhone OS is closed. You need to buy yourself into the iPhone Development Program. Again Apple can shut the program down whenever they like, which is not really different from the control of Adobe over Flash development.
  2. Safari is an intermediate layer and creates sub-standard apps. Jobs claims that third-party intermediate layers result in sub-standard apps. Jobs argues that an intermediate layer will keep developers from platform enhancements, won’t result in targeted great apps and won’t put apps directly on the shoulder of the platform. If he is thinking about Safari, he is right: web-apps for the iPhone just don’t feel great, can’t use all the platform enhancements and don’t result in great targeted apps.

Unique followers on Twitter

As pointed out by Jorg Kennis, Twitter‘s new lists feature make it hard to determine the amount of unique followers. I’ve written a simple script, using a slightly modified TweePy, to determine the amount of unique followers.

Example usage:

bas@w-nz ~/twitter-unique-followers $ python twitter-unique-followers.py JorgK -ubwesterb
Password:
rate_limit_status: remaining_hits: 112
Jorg Kennis
 followed directly by 182
 in lists
  RPtje/vriendenbekenden subscribed to by 0
  JorgK/TechNL subscribed to by 0
  sentfanwyaerda/nijmegen1 subscribed to by 9
  JorgK/Friends subscribed to by 0
  sjorsjes/Community subscribed to by 0
  robinspeijer/iPhone subscribed to by 1
  nielsschooneman/iPhone subscribed to by 0
  JeanPaulH/iPhoneclub subscribed to by 5
number of unique followers: 198

You can download it here. I could write a simple webpage with the same functionality, if anyone would mind.

Normul

Normul normalizes URLs. It expands shortened URLs:

>>> from normul import normul
>>> normul('http://bit.ly/1I4VQ')
{'type': 'other', 'normalized': 'http://www.shinguz.ch/MySQL/mysql_mv.html', 'original': 'http://bit.ly/1I4VQ'}

And shows useful links for hosted-images:


>>> normul('http://yfrog.com/6c5krj')
{'image': {'full': 'http://img228.imageshack.us/img228/1079/5kr.jpg', 'thumbnail': 'http://img228.imageshack.us/img228/1079/5kr.th.jpg'}, 'type': 'image', 'original': 'http://yfrog.com/6c5krj', 'normalized': 'http://yfrog.com/6c5krj'}

You can find the simple but convenient sourcecode here.

Cantor never bores (1)

Given a set of countable sets K, such that K is totally ordered by inclusion, videlicet for every A,B\in K either A\subseteq B or A\supseteq B. Intuitively, for at every step in this chain one element at least must be added, one expects the set K to be countable as well.

Suppose K is countable. Then the union, \bigcup K is a countable union of countable sets, hence countable. (Suppose k: \mathbb N \to K is an enumeration of K and f_i: \mathbb N \to k(i) enumerations of the elements of the chain. Then f_0(0), f_1(0), f_0(1), f_2(0), f_1(1), f_0(2), \ldots enumerates \bigcup K.)

Thus \bigcup K is an upper bound of K. In the poset of countable subsets of some set U, of which \bigcup K is a subset, every non-empty chain has an upper bound. Hence, using Zorn’s lemma there is a maximal element, say M.

Suppose U is uncountable, then there exists a \star \in U \backslash M. M \cup \{\star\} is most definitely also countable and M \subset M \cup \{\star\} which contradicts M’s maximality. We are forced to conclude that there exists an uncountable chain of countable sets.

Cantor’s set theory keeps surprising.

Update: an example of such a chain is the set of the countable ordinals.

Another update: a “more concrete” example are the downsets in \mathbb Q without the empty set and \mathbb Q itself. These downsets correspond to real numbers, see Dedekind Cuts.