06 Feb 2012
Planet Python
Tarek Ziade: Scaling Crypto work in Python
We're building a new service at Services called the Token Server - The idea is simple : give us a Browser ID assertion and a service name, and the Token Server will send you back a token that's good for 30 minutes to use for the specific service.
That indirection makes our live easier to manage user authentication and resource allocation for our services . A few examples:
- when a new user wants to use Firefox Sync, we can check which server has the smallest number of allocated users, and tell the user to go there
- we can manage a user from a central place
- we can manage a user we've never heard about before without asking her to register specifically to each service - that's the whole point of Browser ID
I won't get into more details because that's not the intent of this blog post. But if you are curious the full draft spec is here - https://wiki.mozilla.org/Services/Sagrada/TokenServer
What's this post is really about is how to build this token server.
The server is a single web service that gets a Browser ID assertion and does the following:
- verify the assertion
- create a token, which is a simple JSON mapping
- encrypt and sign the token
The GIL, Gevent, greenlet and the likes
Implementing this using Cornice and a crypto lib is quite simple, but has one major issue : the crypto work is CPU intensive, and even if the libraries we can use have C code under the hood, it seems that the GIL is not released enough to let your threads really use several cores. For example, we benched M2Crypto and it was obvious that a multi-threaded app was locked by the GIL.
But we don't use threads in our Python servers - we use Gevent workers, which are based on greenlets. But while greenlets help on I/O bound calls, it won't help on CPU bound work : you're tied into a single thread in this case and each greenlet that does some CPU work blocks the other ones.
It's easy to demonstrate - see http://tarek.pastebin.mozilla.org/1476644 If I run it on my Mac Book Air, the pure Python synchronous version is always faster (huh, the gevent version is *much* slower, not sure why..)
So the sanest option is to use separate processes and set up a messaging queue between the web service that needs some crypto work to be done and specialized crypto workers.
We're back in that case to our beloved 100% I/O bound model we know how to scale using NGinx + GUnicorn + GEvent
For the crypto workers, we want it to be as fast as possible, so we started to look at Crypto++ which seems promising because it uses CPU-specific calls in ASM. There's the pycryptopp binding that's available to work with Crypto++ but we happen to need to do some tasks that are not available in that lib yet - like HKDF.
Yeah, at that point it became obvious we'd use pure C++ for that part, and drive it from Python.
Message passing
Back to our Token server - we need to send crypto work to our workers and get back the result. The first option that comes in mind is to use multiprocessing to spawn our C++ workers and to feed them with work.
The model is quite simple, but now that we have one piece in C++, it's getting harder to use the built-in tools in multiprocessing to communicate with our workers - we need to be lower level and start to work with signals or sockets. And well, I am not sure what would be left of multiprocessing then.
This is doable but a bit of a pain to do correctly (and in a portable way.) Moreover, if we want to have a robust system, we need to have things like a hearbeat, which requires more inter-process message passing. And now I need to code it in Python and C++
Hold on - Let me summarize my requirements:
- inter-process communication
- something less painful than signals or sockets
- very very very fast
I got tempted by Memory Mapped Files, but the drawbacks I've read here and there scared me.
ZeroMQ
It turns out zeromq is perfect for this job - there are clients in Python and C++, and defining a protocol to exchange data from the Python web server to the crypto workers is quite simple.
In fact, this can be done as a reusable library that takes care of passing messages to workers and getting back results. It has been done hundreds of times, there are many examples in the zmq website, but I have failed to find any Python packaged library that would let me push some work to workers transparently, via a simple execute() call - if you know one tell me!.
So I am building one since it's quite short and simple - The project is called PowerHose and is located here : https://github.com/mozilla-services/powerhose.
Here is its descriptions/limitations:
- Powerhose is based on a single master and multiple workers protocol
- The Master opens a socket and waits for workers to register themselves into it
- The worker registers itself to the master, provides the path to its own socket, and wait for some work on it.
- Workers are performing the work synchronously and send back the result immediatly.
- The master load-balances on available workers, and if all are busy waits a bit before it times out.
- The worker pings the master on a regular basis and exits if it's unable to reach it. It attempts several time to reconnect to give a chance to the master to come back.
- Workers are language agnostic and a master could run heterogeneous workers (one in C, one in Python etc..)
- Powerhose is not serializing/deserializing the data - it sends plain strings. This is the responsibility of the program that uses it.
- Powerhose is not responsible to respawn a master or a worker that dies. I plan to use daemontools for this, and maybe provide a script that runs all workers at once.
- Powerhose do not queue works and just rely on zeromq sockets.
The library implements this protocol and gives two tools to use it:
- A JobRunner class in Python, you can use to send some work to be done
- A Worker class in Python and C++, you can use as a base class to implement workers
Here's an example of using Powerhose:
- The Server - https://github.com/mozilla-services/powerhose/blob/master/examples/square_master.py
- The Python worker - https://github.com/mozilla-services/powerhose/blob/master/examples/square_worker.py
- The C++ worker (don't look at the code
- https://github.com/mozilla-services/powerhose/blob/master/examples/square_worker.cpp
For the Token server, we'll have:
- A JobRunner in our Cornice application
- A C++ worker that uses Crypto++
The first benches look fantastic - probably faster that anything I'd have implemented myself using plain sockets ![]()
I'll try to package Powerhose so other projects at Mozilla can use it. I am wondering if this could be useful to more people, since I failed to find that kind of tool. How do you scale your CPU-bound web apps ?
06 Feb 2012 8:17am GMT
Tarek Ziade: Scaling Crypto work in Python
We're building a new service at Services called the Token Server - The idea is simple : give us a Browser ID assertion and a service name, and the Token Server will send you back a token that's good for 30 minutes to use for the specific service.
That indirection makes our live easier to manage user authentication and resource allocation for our services . A few examples:
- when a new user wants to use Firefox Sync, we can check which server has the smallest number of allocated users, and tell the user to go there
- we can manage a user from a central place
- we can manage a user we've never heard about before without asking her to register specifically to each service - that's the whole point of Browser ID
I won't get into more details because that's not the intent of this blog post. But if you are curious the full draft spec is here - https://wiki.mozilla.org/Services/Sagrada/TokenServer
What's this post is really about is how to build this token server.
The server is a single web service that gets a Browser ID assertion and does the following:
- verify the assertion
- create a token, which is a simple JSON mapping
- encrypt and sign the token
The GIL, Gevent, greenlet and the likes
Implementing this using Cornice and a crypto lib is quite simple, but has one major issue : the crypto work is CPU intensive, and even if the libraries we can use have C code under the hood, it seems that the GIL is not released enough to let your threads really use several cores. For example, we benched M2Crypto and it was obvious that a multi-threaded app was locked by the GIL.
But we don't use threads in our Python servers - we use Gevent workers, which are based on greenlets. But while greenlets help on I/O bound calls, it won't help on CPU bound work : you're tied into a single thread in this case and each greenlet that does some CPU work blocks the other ones.
It's easy to demonstrate - see http://tarek.pastebin.mozilla.org/1476644 If I run it on my Mac Book Air, the pure Python synchronous version is always faster (huh, the gevent version is *much* slower, not sure why..)
So the sanest option is to use separate processes and set up a messaging queue between the web service that needs some crypto work to be done and specialized crypto workers.
We're back in that case to our beloved 100% I/O bound model we know how to scale using NGinx + GUnicorn + GEvent
For the crypto workers, we want it to be as fast as possible, so we started to look at Crypto++ which seems promising because it uses CPU-specific calls in ASM. There's the pycryptopp binding that's available to work with Crypto++ but we happen to need to do some tasks that are not available in that lib yet - like HKDF.
Yeah, at that point it became obvious we'd use pure C++ for that part, and drive it from Python.
Message passing
Back to our Token server - we need to send crypto work to our workers and get back the result. The first option that comes in mind is to use multiprocessing to spawn our C++ workers and to feed them with work.
The model is quite simple, but now that we have one piece in C++, it's getting harder to use the built-in tools in multiprocessing to communicate with our workers - we need to be lower level and start to work with signals or sockets. And well, I am not sure what would be left of multiprocessing then.
This is doable but a bit of a pain to do correctly (and in a portable way.) Moreover, if we want to have a robust system, we need to have things like a hearbeat, which requires more inter-process message passing. And now I need to code it in Python and C++
Hold on - Let me summarize my requirements:
- inter-process communication
- something less painful than signals or sockets
- very very very fast
I got tempted by Memory Mapped Files, but the drawbacks I've read here and there scared me.
ZeroMQ
It turns out zeromq is perfect for this job - there are clients in Python and C++, and defining a protocol to exchange data from the Python web server to the crypto workers is quite simple.
In fact, this can be done as a reusable library that takes care of passing messages to workers and getting back results. It has been done hundreds of times, there are many examples in the zmq website, but I have failed to find any Python packaged library that would let me push some work to workers transparently, via a simple execute() call - if you know one tell me!.
So I am building one since it's quite short and simple - The project is called PowerHose and is located here : https://github.com/mozilla-services/powerhose.
Here is its descriptions/limitations:
- Powerhose is based on a single master and multiple workers protocol
- The Master opens a socket and waits for workers to register themselves into it
- The worker registers itself to the master, provides the path to its own socket, and wait for some work on it.
- Workers are performing the work synchronously and send back the result immediatly.
- The master load-balances on available workers, and if all are busy waits a bit before it times out.
- The worker pings the master on a regular basis and exits if it's unable to reach it. It attempts several time to reconnect to give a chance to the master to come back.
- Workers are language agnostic and a master could run heterogeneous workers (one in C, one in Python etc..)
- Powerhose is not serializing/deserializing the data - it sends plain strings. This is the responsibility of the program that uses it.
- Powerhose is not responsible to respawn a master or a worker that dies. I plan to use daemontools for this, and maybe provide a script that runs all workers at once.
- Powerhose do not queue works and just rely on zeromq sockets.
The library implements this protocol and gives two tools to use it:
- A JobRunner class in Python, you can use to send some work to be done
- A Worker class in Python and C++, you can use as a base class to implement workers
Here's an example of using Powerhose:
- The Server - https://github.com/mozilla-services/powerhose/blob/master/examples/square_master.py
- The Python worker - https://github.com/mozilla-services/powerhose/blob/master/examples/square_worker.py
- The C++ worker (don't look at the code
- https://github.com/mozilla-services/powerhose/blob/master/examples/square_worker.cpp
For the Token server, we'll have:
- A JobRunner in our Cornice application
- A C++ worker that uses Crypto++
The first benches look fantastic - probably faster that anything I'd have implemented myself using plain sockets ![]()
I'll try to package Powerhose so other projects at Mozilla can use it. I am wondering if this could be useful to more people, since I failed to find that kind of tool. How do you scale your CPU-bound web apps ?
06 Feb 2012 8:17am GMT
Bit of Cheese: A few more random bits
pytagcloud - is one to watch: make tag clouds as PNG images or HTML. Usage is a bit fiddly at the moment and I couldn't replicate the results they got. I think the key is having a good tag (interesting word) extractor. This bit of code might come in handy when experimenting with it:
import re from roundup.backends.indexer_common import STOPWORDS import requests, collections, bs4 soup = requests.get('http://www.python.org/about/').text text = bs4.BeautifulSoup(soup).find('div', id='content-body').get_text() counts = collections.defaultdict(int) for word in re.split('\W+', text): if word.upper() not in STOPWORDS and len(word)>2: counts[word.lower()] += 1 words = sorted((count, word) for word, count in counts.items()) tags = [(word, count) for count, word in words[-30:]] from pytagcloud import make_tags, create_tag_image create_tag_image(make_tags(tags), 'cloud.png')
slumber - call web RESTful (HTTP) APIs from Python code. Supports JSON, and YAML (with pyyaml installed) and is built on top of the awesome requests. While looking at slumber I picked up this tip for validating and pretty-printing JSON:
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
06 Feb 2012 6:07am GMT
Bit of Cheese: A few more random bits
pytagcloud - is one to watch: make tag clouds as PNG images or HTML. Usage is a bit fiddly at the moment and I couldn't replicate the results they got. I think the key is having a good tag (interesting word) extractor. This bit of code might come in handy when experimenting with it:
import re from roundup.backends.indexer_common import STOPWORDS import requests, collections, bs4 soup = requests.get('http://www.python.org/about/').text text = bs4.BeautifulSoup(soup).find('div', id='content-body').get_text() counts = collections.defaultdict(int) for word in re.split('\W+', text): if word.upper() not in STOPWORDS and len(word)>2: counts[word.lower()] += 1 words = sorted((count, word) for word, count in counts.items()) tags = [(word, count) for count, word in words[-30:]] from pytagcloud import make_tags, create_tag_image create_tag_image(make_tags(tags), 'cloud.png')
slumber - call web RESTful (HTTP) APIs from Python code. Supports JSON, and YAML (with pyyaml installed) and is built on top of the awesome requests. While looking at slumber I picked up this tip for validating and pretty-printing JSON:
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
06 Feb 2012 6:07am GMT
Joe Abbate: Automated Database Augmentation
Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.
If you have some experience as a DBA, the word "drudgery" may have come to mind at the prospect of implementing the above features. It's possible that, after a while, you've developed an approach for dealing with some of them but still wish there'd be some way to automate these thankless tasks.
You may have looked at the Andromeda project's "automations" which provide some of these capabilities. However, in order to take advantage of the automations, you'll first have to manually describe your database in a YAML format (and you'll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you'd still have to create a YAML schema (no need for Apache, but you still need PHP).
Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.
Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:
schema public:
table actor:
audit_columns: default
table category:
audit_columns: default
...
table store:
audit_columns: default
The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,
dbextend pagiladb audext.yaml
dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for "audit_columns: default", for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.
The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.
In case you're wondering, dbextend -like other Pyrseas tools- will require Python, psycopg2 and pyyaml.
What features would you like to see automated? What are your suggested best practices for automating these common needs?
Picture credit: Thanks to Mr. O'Brien, a fourth-grade teacher in Minnesota.
1 We're still receptive to some other suitable name.
Filed under: Database tools, PostgreSQL, Python
![]()
06 Feb 2012 3:38am GMT
Joe Abbate: Automated Database Augmentation
Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.
If you have some experience as a DBA, the word "drudgery" may have come to mind at the prospect of implementing the above features. It's possible that, after a while, you've developed an approach for dealing with some of them but still wish there'd be some way to automate these thankless tasks.
You may have looked at the Andromeda project's "automations" which provide some of these capabilities. However, in order to take advantage of the automations, you'll first have to manually describe your database in a YAML format (and you'll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you'd still have to create a YAML schema (no need for Apache, but you still need PHP).
Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.
Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:
schema public:
table actor:
audit_columns: default
table category:
audit_columns: default
...
table store:
audit_columns: default
The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,
dbextend pagiladb audext.yaml
dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for "audit_columns: default", for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.
The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.
In case you're wondering, dbextend -like other Pyrseas tools- will require Python, psycopg2 and pyyaml.
What features would you like to see automated? What are your suggested best practices for automating these common needs?
Picture credit: Thanks to Mr. O'Brien, a fourth-grade teacher in Minnesota.
1 We're still receptive to some other suitable name.
Filed under: Database tools, PostgreSQL, Python
![]()
06 Feb 2012 3:38am GMT
EmptysquarePython: This Thursday: a talk on Python, MongoDB, and asynchronous web frameworks

This Thursday in NYC I'm talking about Python, MongoDB, and asynchronous web frameworks at a meetup called For the Love of Python: Wine tasting, Red velvet cupcakes, and Tech Talks. The talk is a work in progress. To be strictly accurate, I have not yet started working on the talk, because the code I'll be talking about is itself a work in progress. But come anyway, because I've been thinking a lot on this subject for the last few months, and I intend to present:
- A high-level discussion of what an async web framework is and when you need it, or don't. I think there's a lot of sloppiness on this subject, and I want to work with the audience on tightening up our thinking.
- A review of pymongo, pthreads, Tornado, asyncmongo, and gevent. You won't be disappointed.
- For the first time ever, I will present an exclusive sneak-peak at my own experimental Python driver for MongoDB and Tornado, built on top of the official pymongo driver. It's pretty snazzy, it uses greenlets, and it's an example of a general pattern for asynchronizing synchronous database drivers that might inspire you to write your own database driver in Python. Buckle your seatbelts, we're going deep.
06 Feb 2012 1:57am GMT
EmptysquarePython: This Thursday: a talk on Python, MongoDB, and asynchronous web frameworks

This Thursday in NYC I'm talking about Python, MongoDB, and asynchronous web frameworks at a meetup called For the Love of Python: Wine tasting, Red velvet cupcakes, and Tech Talks. The talk is a work in progress. To be strictly accurate, I have not yet started working on the talk, because the code I'll be talking about is itself a work in progress. But come anyway, because I've been thinking a lot on this subject for the last few months, and I intend to present:
- A high-level discussion of what an async web framework is and when you need it, or don't. I think there's a lot of sloppiness on this subject, and I want to work with the audience on tightening up our thinking.
- A review of pymongo, pthreads, Tornado, asyncmongo, and gevent. You won't be disappointed.
- For the first time ever, I will present an exclusive sneak-peak at my own experimental Python driver for MongoDB and Tornado, built on top of the official pymongo driver. It's pretty snazzy, it uses greenlets, and it's an example of a general pattern for asynchronizing synchronous database drivers that might inspire you to write your own database driver in Python. Buckle your seatbelts, we're going deep.
06 Feb 2012 1:57am GMT
05 Feb 2012
Planet Python
Kenneth Reitz: Amon - Python-powered server monitoring, logging, and error reporting with JSON API
Amon - Python-powered server monitoring, logging, and error reporting with JSON API:
Amon from Martin Rusev is a simple yet flexible way to add server monitoring, logging, and error tracking to your web stack. Amon consists of three parts: a collector daemon, a Python web app, and JSON API.
- Collector daemon - Amon's server and process monitoring is a thin wrapper on top of Unix tools to record metrics and store them in the MongoDB backend.
- API - Shipping with language bindings for Python, Ruby, and JavaScript, Amon's JSON API makes it easy to record your own application events.
- Web interface - The web app provides a friendly user interface for viewing logs and visualizing data in charts.
The Amon documentation site is a great place to get started with installation and usage.
05 Feb 2012 10:00pm GMT
Kenneth Reitz: Amon - Python-powered server monitoring, logging, and error reporting with JSON API
Amon - Python-powered server monitoring, logging, and error reporting with JSON API:
Amon from Martin Rusev is a simple yet flexible way to add server monitoring, logging, and error tracking to your web stack. Amon consists of three parts: a collector daemon, a Python web app, and JSON API.
- Collector daemon - Amon's server and process monitoring is a thin wrapper on top of Unix tools to record metrics and store them in the MongoDB backend.
- API - Shipping with language bindings for Python, Ruby, and JavaScript, Amon's JSON API makes it easy to record your own application events.
- Web interface - The web app provides a friendly user interface for viewing logs and visualizing data in charts.
The Amon documentation site is a great place to get started with installation and usage.
05 Feb 2012 10:00pm GMT
Reinout van Rees: Apple lion reinstall experience and surprise
My year-old macbook installation was showing its age. Or rather, there were some things wrong with it:
- The original OS was 10.6, snow leopard. I upgraded it to lion (10.7) half a year ago. This was an in-place upgrade, not a fresh install. I wanted a fresh install to clean some stuff up and because it started to feel slow. I heard that a clean install would help a lot regarding speed.
- I work a lot with geographic libraries, Django and geodjango. So originally I installed everything via the kyngchaos packages. Mapnik, gdal, spatialite and so on. But after the lion upgrade, I couldn't compile any python packages with C extensions anymore as gcc 4.0 (which everything had been build with) had been replaced by 4.2. And spatialite never would work right anyway. So I wanted to replace this.
- I used homebrew as a package manager for the gnu/unix side of things instead of macports I'd been using before. It works, but I missed some things, like Quantum GIS (QGIS), which is included in macports. I hoped to get everything python+gis related done with one package manager, in my case macports.
So I made sure my backups were OK, that my code was all committed, that my repositories were cleaned up, that all my dotfiles in my homedir were in version control and so on. Most of it was already OK, but of course there were some small things left. I'll do a write-up later on of my backup strategy and how I handle my dotfiles and so.
Time for the actual lion reinstall. How does that work? I bought Lion from the app store, so it was downloaded and installed by my mac: I didn't have an install DVD. Turns out to be easy: just restart and press command-r during bootup and you'll get a "lion recovery" menu. Choose the reinstall option and it will download the latest full version and install it for you. Simple and works.
The big surprise came when the computer rebooted. I expected a dialog to set up a main user. Instead, I got the regular login screen. Ok... Logging in... Hey! All my stuff is still there! All the settings, all my documents, all my music... No need to restore backups.
So: an OSX lion restore wipes only the OS and reinstalls it. Including xcode, btw. The rest (your own data, applications, settings) is retained. Actually pretty handy.
This did mean I had to clean up the kyngchaos packages and homebrew by hand. Just a matter of deleting some directories, telling homebrew to erase itself and adjusting my paths.
05 Feb 2012 8:12pm GMT
Reinout van Rees: Apple lion reinstall experience and surprise
My year-old macbook installation was showing its age. Or rather, there were some things wrong with it:
- The original OS was 10.6, snow leopard. I upgraded it to lion (10.7) half a year ago. This was an in-place upgrade, not a fresh install. I wanted a fresh install to clean some stuff up and because it started to feel slow. I heard that a clean install would help a lot regarding speed.
- I work a lot with geographic libraries, Django and geodjango. So originally I installed everything via the kyngchaos packages. Mapnik, gdal, spatialite and so on. But after the lion upgrade, I couldn't compile any python packages with C extensions anymore as gcc 4.0 (which everything had been build with) had been replaced by 4.2. And spatialite never would work right anyway. So I wanted to replace this.
- I used homebrew as a package manager for the gnu/unix side of things instead of macports I'd been using before. It works, but I missed some things, like Quantum GIS (QGIS), which is included in macports. I hoped to get everything python+gis related done with one package manager, in my case macports.
So I made sure my backups were OK, that my code was all committed, that my repositories were cleaned up, that all my dotfiles in my homedir were in version control and so on. Most of it was already OK, but of course there were some small things left. I'll do a write-up later on of my backup strategy and how I handle my dotfiles and so.
Time for the actual lion reinstall. How does that work? I bought Lion from the app store, so it was downloaded and installed by my mac: I didn't have an install DVD. Turns out to be easy: just restart and press command-r during bootup and you'll get a "lion recovery" menu. Choose the reinstall option and it will download the latest full version and install it for you. Simple and works.
The big surprise came when the computer rebooted. I expected a dialog to set up a main user. Instead, I got the regular login screen. Ok... Logging in... Hey! All my stuff is still there! All the settings, all my documents, all my music... No need to restore backups.
So: an OSX lion restore wipes only the OS and reinstalls it. Including xcode, btw. The rest (your own data, applications, settings) is retained. Actually pretty handy.
This did mean I had to clean up the kyngchaos packages and homebrew by hand. Just a matter of deleting some directories, telling homebrew to erase itself and adjusting my paths.
05 Feb 2012 8:12pm GMT
Ruslan Spivak: Pyramid is Awesome for Beginners
I know some people think that Pyramid is a complex framework and that aspiring Python web developers should start with something else and maybe come back to Pyramid later.
But I truly believe that even if you're a total beginner Pyramid can serve you really well and once you grow more experienced, you'll appreciate the power that you get with the Pyramid framework.
Take a look at this "hello world" application in hello.py:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
def hello(request):
return 'Hello %(name)s!' % request.matchdict
if __name__ == '__main__':
config = Configurator()
config.add_route('main', '/hello/{name}')
config.add_view(hello, route_name='main', renderer='string')
server = make_server('', 8080, config.make_wsgi_app())
server.serve_forever()
It wasn't that difficult.
And this is how to install and run it:
$ pip install pyramid $ python hello.py
Then in your browser type http://localhost:8080/hello/fred and see the result. Easy peasy.
So what are you waiting for?
Grab it here and give it a try!
05 Feb 2012 7:20pm GMT
Ruslan Spivak: Pyramid is Awesome for Beginners
I know some people think that Pyramid is a complex framework and that aspiring Python web developers should start with something else and maybe come back to Pyramid later.
But I truly believe that even if you're a total beginner Pyramid can serve you really well and once you grow more experienced, you'll appreciate the power that you get with the Pyramid framework.
Take a look at this "hello world" application in hello.py:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
def hello(request):
return 'Hello %(name)s!' % request.matchdict
if __name__ == '__main__':
config = Configurator()
config.add_route('main', '/hello/{name}')
config.add_view(hello, route_name='main', renderer='string')
server = make_server('', 8080, config.make_wsgi_app())
server.serve_forever()
It wasn't that difficult.
And this is how to install and run it:
$ pip install pyramid $ python hello.py
Then in your browser type http://localhost:8080/hello/fred and see the result. Easy peasy.
So what are you waiting for?
Grab it here and give it a try!
05 Feb 2012 7:20pm GMT
Greg Taylor: PyATL Jam session this Tuesday
PyATL is having a Jam Session this Tuesday at 7PM, for those in the area and interested. In addition to a presentations for getting started with a few web frameworks (Django, Bottle), there will be some hackage on various projects. I'll be there, looking to help people with, or work on boto.
If you're interested in coming, RSVP on the Meetup page. If you're wanting to hack on, or get help with boto, shoot a tweet at me and let me know so I can be ready for you.
05 Feb 2012 6:40pm GMT
Greg Taylor: PyATL Jam session this Tuesday
PyATL is having a Jam Session this Tuesday at 7PM, for those in the area and interested. In addition to a presentations for getting started with a few web frameworks (Django, Bottle), there will be some hackage on various projects. I'll be there, looking to help people with, or work on boto.
If you're interested in coming, RSVP on the Meetup page. If you're wanting to hack on, or get help with boto, shoot a tweet at me and let me know so I can be ready for you.
05 Feb 2012 6:40pm GMT
Vinay Sajip: Working with subprocess
The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz's Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There's also Andrew Moffat's pbs project, which aims to let you do things like
from pbs import ifconfig
print ifconfig("eth0")
Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don't find that wc(ls("/etc", "-1"), "-l") is more readable than call("ls /etc -1 | wc -l") in the general case.
I've been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:
- I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
- I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don't support some of the syntax I want to use, like &&, ||, |& and so on.
- I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
- I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
- I don't want to set arbitrary limits on passing data between processes, such as Envoy's 10MB limit.
- subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.
I've been working on supporting these use cases, so sarge offers the following features:
-
A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:
>>> p = run('false && echo foo')
>>> p.commands
[Command('false')]
>>> p.returncodes
[1]
>>> p.returncode
1
>>> p = run('false || echo foo')
foo
>>> p.commands
[Command('false'), Command('echo foo')]
>>> p.returncodes
[1, 0]
>>> p.returncode
0
-
The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:
>>> from sarge import shell_format
>>> shell_format('ls {0}', '*.py')
"ls '*.py'"
>>> shell_format('cat {0}', 'a file name with spaces')
"cat 'a file name with spaces'"
-
The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:
>>> from sarge import Capture, run
>>> with Capture() as out:
... run('echo foobarbaz', stdout=out)
...
<sarge.Pipeline object at 0x175ed10>
>>> out.read(3)
'foo'
>>> out.read(3)
'bar'
>>> out.read(3)
'baz'
>>> out.read(3)
'\n'
>>> out.read(3)
''
A Capture object can capture the output from multiple commands:
>>> from sarge import run, Capture
>>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
>>> p.stdout.readline()
'foo\n'
>>> p.stdout.readline()
'bar\n'
>>> p.stdout.readline()
'baz\n'
>>> p.stdout.readline()
''
Delays in commands are honoured in asynchronous calls:
>>> from sarge import run, Capture
>>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
>>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
>>> p.close() # wait for completion
>>> p.stdout.readline()
'foo\n'
>>> p.stdout.readline()
'baz\n'
>>> p.stdout.readline()
'bar\n'
>>>
Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.
Sarge hasn't been released yet, but it's not far off being ready. It's meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).
I have released the sarge documentation on Read The Docs; I'm hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.
05 Feb 2012 9:00am GMT
Vinay Sajip: Working with subprocess
The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz's Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There's also Andrew Moffat's pbs project, which aims to let you do things like
from pbs import ifconfig
print ifconfig("eth0")
Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don't find that wc(ls("/etc", "-1"), "-l") is more readable than call("ls /etc -1 | wc -l") in the general case.
I've been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:
- I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
- I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don't support some of the syntax I want to use, like &&, ||, |& and so on.
- I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
- I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
- I don't want to set arbitrary limits on passing data between processes, such as Envoy's 10MB limit.
- subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.
I've been working on supporting these use cases, so sarge offers the following features:
-
A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:
>>> p = run('false && echo foo')
>>> p.commands
[Command('false')]
>>> p.returncodes
[1]
>>> p.returncode
1
>>> p = run('false || echo foo')
foo
>>> p.commands
[Command('false'), Command('echo foo')]
>>> p.returncodes
[1, 0]
>>> p.returncode
0
-
The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:
>>> from sarge import shell_format
>>> shell_format('ls {0}', '*.py')
"ls '*.py'"
>>> shell_format('cat {0}', 'a file name with spaces')
"cat 'a file name with spaces'"
-
The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:
>>> from sarge import Capture, run
>>> with Capture() as out:
... run('echo foobarbaz', stdout=out)
...
<sarge.Pipeline object at 0x175ed10>
>>> out.read(3)
'foo'
>>> out.read(3)
'bar'
>>> out.read(3)
'baz'
>>> out.read(3)
'\n'
>>> out.read(3)
''
A Capture object can capture the output from multiple commands:
>>> from sarge import run, Capture
>>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
>>> p.stdout.readline()
'foo\n'
>>> p.stdout.readline()
'bar\n'
>>> p.stdout.readline()
'baz\n'
>>> p.stdout.readline()
''
Delays in commands are honoured in asynchronous calls:
>>> from sarge import run, Capture
>>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
>>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
>>> p.close() # wait for completion
>>> p.stdout.readline()
'foo\n'
>>> p.stdout.readline()
'baz\n'
>>> p.stdout.readline()
'bar\n'
>>>
Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.
Sarge hasn't been released yet, but it's not far off being ready. It's meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).
I have released the sarge documentation on Read The Docs; I'm hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.
05 Feb 2012 9:00am GMT
Ludvig Ericson: So you're a joker?
I read an article by some guy called SM on the subject of jokers, he's saying the world is full of jokers - people who talk a lot but do little.
I am a fuck-up at my current workplace - I handle sick leaves poorly, I show up for work five minutes late rather than five minutes early; I am a fuck-up at house chores - I rarely do the dishes, laundry is everywhere, cleaning is the last thing I think about; I sometimes fuck up with friends - I miss out on keeping in touch, I borrow money and forget about it, I hit on some poor guy's ex, the list goes on.
I am not a fuck-up in my true nature, in fact I'm probably more of an over-zealous Asperger kid inside. I don't give up before it's too late, and I find a way when I need to. I move heaven and earth, as SM puts it.
At first the logics seem counter-intuitive, but really it's an ages old problem: you have an infinite set of chores, and a limited rate of chore churning. How do you balance the workload; what do you do well, half-assed and not at all? More often than not, there is a conflict of interest between the various aspects of life. You have to call the shots.
The todo list is the only way to avoid being a joker. You will have to defer tasks. That's just reality. You will sometimes defer tasks up to a point where you realize, "ah man wish I was going to do this but I'm not." That's not being a joker, that's just you being rational.
So while I agree that it's a good thing to go into tunnel vision mode and just churn out a product in no time, it's also not a viable lifestyle. SM makes it seem as if the only way to live is 150% speed all the time and get rich.
Call me complicated, but I want more out of life than that. If what it takes to make piles of money is complete tunnel vision, then I shall have none of it. Let me sit smug-faced in my middle-class bed and enjoy life before it flashes me by.
05 Feb 2012 12:34am GMT
Ludvig Ericson: So you're a joker?
I read an article by some guy called SM on the subject of jokers, he's saying the world is full of jokers - people who talk a lot but do little.
I am a fuck-up at my current workplace - I handle sick leaves poorly, I show up for work five minutes late rather than five minutes early; I am a fuck-up at house chores - I rarely do the dishes, laundry is everywhere, cleaning is the last thing I think about; I sometimes fuck up with friends - I miss out on keeping in touch, I borrow money and forget about it, I hit on some poor guy's ex, the list goes on.
I am not a fuck-up in my true nature, in fact I'm probably more of an over-zealous Asperger kid inside. I don't give up before it's too late, and I find a way when I need to. I move heaven and earth, as SM puts it.
At first the logics seem counter-intuitive, but really it's an ages old problem: you have an infinite set of chores, and a limited rate of chore churning. How do you balance the workload; what do you do well, half-assed and not at all? More often than not, there is a conflict of interest between the various aspects of life. You have to call the shots.
The todo list is the only way to avoid being a joker. You will have to defer tasks. That's just reality. You will sometimes defer tasks up to a point where you realize, "ah man wish I was going to do this but I'm not." That's not being a joker, that's just you being rational.
So while I agree that it's a good thing to go into tunnel vision mode and just churn out a product in no time, it's also not a viable lifestyle. SM makes it seem as if the only way to live is 150% speed all the time and get rich.
Call me complicated, but I want more out of life than that. If what it takes to make piles of money is complete tunnel vision, then I shall have none of it. Let me sit smug-faced in my middle-class bed and enjoy life before it flashes me by.
05 Feb 2012 12:34am GMT
04 Feb 2012
Planet Python
Stefan Scherfke: Designing and Testing PyZMQ Applications – Part 1
ZeroMQ (or ØMQ or ZMQ) is an intelligent messaging framework and described as "sockets on steroids". That is, they look like normal TCP sockets but actually work as you'd expect sockets to work. PyZMQ adds even more convenience to them, which makes it a really a good choice if you want to implement a distributed application. Another big plus for ØMQ is that you can integrate sub-systems written in C, Java or any other language ØMQ supports (which are a lot).
If you've never heard of ØMQ before, I recommend to read ZeroMQ an Introduction by Nicholas Piël, before you go on with this article.
The ØMQ Guide and PyZMQ's documentation are really good, so you can easily get started. However, when we began to implement a larger application with it (a distributed simulation framework), several questions arose which were not covered by the documentation:
- What's the best way do design our application?
- How can we keep it readable, flexible and maintainable?
- How do we test it?
I didn't find something like a best practice article that answered my questions. So in this series of articles, I'm going to talk about what I've learned during the last months. I'm not a PyZMQ expert (yet ;-)), but what I've done so far works quite well and I never had more tests in a project than I do have now.
You'll find the source for the examples at bitbucket. They are written in Python 3.2 and tested under Mac OS X Lion, Ubuntu 11.10 and Windows 7, 64 bit in each case. If you have any suggestions or improvements, please fork me or just leave a comment.
In this first article, I'm going to talk a bit about how you could generally design your application to be flexible, maintainable and testable. The second part will be about unit testing and the finally, I'll cover process and system testing.
Comparison of Different Approaches
There are basically three possible ways to implement a PyZMQ application. One, that's easy, but limited in practical use, one that's more flexible, but not really pythonic and one, that needs a bit more setup, but is flexible and pythonic.
All three examples feature a simple ping process and a pong process with varying complexity. I use multiprocessing to run the pong process, because that's what you should usually do in real PyZMQ applications (you don't want to use threads and if both processes are running on the same machine, there's no need to invoke both of them separately).
All of the examples will have the following output:
(zmq)$ python blocking_recv.py Pong got request: ping 0 Ping got reply: pong 0 ... Pong got request: ping 4 Ping got reply: pong 4
Let's start with the easy one first. You just use on of the socket's recv methods in a loop:
# blocking_recv.py import multiprocessing import zmq addr = 'tcp://127.0.0.1:5678' def ping(): """Sends ping requests and waits for replies.""" context = zmq.Context() sock = context.socket(zmq.REQ) sock.bind(addr) for i in range(5): sock.send_unicode('ping %s' % i) rep = sock.recv_unicode() # This blocks until we get something print('Ping got reply:', rep) def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.connect(addr) for i in range(5): req = sock.recv_unicode() # This also blocks print('Pong got request:', req) sock.send_unicode('pong %s' % i) if __name__ == '__main__': pong_proc = multiprocessing.Process(target=pong) pong_proc.start() ping() pong_proc.join()
So this is very easy and no that much code. The problem with this is, that it only works well if your process only uses one socket. Unfortunately, in larger applications that is rather rarely the case.
A way to handle multiple sockets per process is polling. In addition to your context and socket(s), you need a poller. You also have to tell it which events on which socket you are going to poll:
# polling.py def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.bind(addr) # Create a poller and register the events we want to poll poller = zmq.Poller() poller.register(sock, zmq.POLLIN|zmq.POLLOUT) for i in range(10): # Get all sockets that can do something socks = dict(poller.poll()) # Check if we can receive something if sock in socks and socks[sock] == zmq.POLLIN: req = sock.recv_unicode() print('Pong got request:', req) # Check if we cann send something if sock in socks and socks[sock] == zmq.POLLOUT: sock.send_unicode('pong %s' % (i // 2)) poller.unregister(sock)
You see, that our pong function got pretty ugly. You need 10 iterations to do five ping-pongs, because in each iteration you can either send or reply. And each socket you add to your process adds two more if-statements. You could improve that design if you created a base class wrapping the polling loop and just register sockets and callbacks in an inheriting class.
That brings us to our final example. PyZMQ comes with with an adapted Tornado eventloop that handles the polling and works with ZMQStreams, that wrap sockets and add some functionality:
# eventloop.py from zmq.eventloop import ioloop, zmqstream class Pong(multiprocessing.Process): """Waits for ping requests and replies with a pong.""" def __init__(self): super().__init__() self.loop = None self.stream = None self.i = 0 def run(self): """ Initializes the event loop, creates the sockets/streams and starts the (blocking) loop. """ context = zmq.Context() self.loop = ioloop.IOLoop.instance() # This is the event loop sock = context.socket(zmq.REP) sock.bind(addr) # We need to create a stream from our socket and # register a callback for recv events. self.stream = zmqstream.ZMQStream(sock, self.loop) self.stream.on_recv(self.handle_ping) # Start the loop. It runs until we stop it. self.loop.start() def handle_ping(self, msg): """Handles ping requests and sends back a pong.""" # req is a list of byte objects req = msg[0].decode() print('Pong got request:', req) self.stream.send_unicode('pong %s' % self.i) # We'll stop the loop after 5 pings self.i += 1 if self.i == 5: self.stream.flush() self.loop.stop()
This even adds more boilerplate code, but it will pay of if you use more sockets and most of that stuff in run() can be put into a base class. Another drawback is, that the IOLoop only uses recv_multipart(). So you always get a lists of byte strings which you have to decode or deserialize on your own. However, you can use all the send methods socket offers (like send_unicode() or send_json()). You can also stop the loop from within a message handler.
In the next sections, I'll discuss how you could implement a PyZMQ process that uses the event loop.
Communication Design
Before you start to implement anything, you should think about what kind of processes you need in your application and which messages they exchange. You should also decide what kind of message format and serialization you want to use.
PyZMQ has built-in support for Unicode (send sends plain C strings which map to Python byte objects, so there's a separate method to send Unicode strings), JSON and Pickle.
JSON is nice, because it's fast and lets you integrate processes written in other languages into you application. It's also a bit safer, because you cannot receive arbitrary objects as with pickle. The most straightforward syntax for JSON messages is to let them be triples [msg_type, args, kwargs], where msg_type maps to a method name and args and kwargs get passed as positional and keyword arguments.
I strongly recommend you to document each chain of messages your application sends to perform a certain task. I do this with fancy PowerPoint graphics and with even fancier ASCII art in Sphinx. Here is how I would document our ping-pong:
Sending pings ------------- * If the ping process sends a *ping*, the pong processes responds with a *pong*. * The number of pings (and pongs) is counted. The current ping count is sent with each message. :: PingProc PongProc [REQ] ---1--> [REP] <--2--- 1 IN : ['ping, count'] 1 OUT: ['ping, count'] 2 IN : ['pong, count'] 2 OUT: ['pong, count']
First, I write some bullet points that explain how the processes behave and why they behave this way. This is followed by some kind of sequence diagram that shows when which process sents which message using which socket type. Finally, I write down how the messages are looking. # IN is what you would pass to send_multipart and # OUT is, what is received on the other side by recv_multipart. If one of the participating sockets is a ROUTER or DEALER, IN and OUT will differ (though that's not the case in this example). Everything in single quotation marks (') represents a JSON serialized list.
If our pong process used a ROUTER socket instead of the REP socket, it would look like this:
1 IN : ['ping, count'] 1 OUT: [ping_uuid, '', 'ping, count'] 2 IN : [ping_uuid, '', 'pong, count'] 2 OUT: ['pong, count']
This seems like a lot of tedious work, but trust me, it really helps a lot when you need to change something a few weeks later!
Application Design
In the examples above, the Pong process was responsible for setting everything up, for receiving/sending messages and for the actual application logic (counting incoming pings and creating a pong).
Obviously, this is not a very good design. What we can do about this is to put most of that nasty setup stuff into a base class which all your processes can inherit from, and to put all the actual application logic into a separate (PyZMQ independent) class.
ZmqPocess - The Base Class for all Processes
The base class basically implements two things:
- a setup method that creates a context an a loop
- a stream factory method for streams with a on_recv callback. It creates a socket and can connect/bind it to a given address or bind it to a random port (that's why it returns the port number in addition to the stream itself).
It also inherits multiprocessing.Process so that it is easier to spawn it as sub-process. Of course, you can also just call its run() method from you main().
# zmqproc.py import multiprocessing from zmq.eventloop import ioloop, zmqstream import zmq class ZmqProcess(multiprocessing.Process): """ This is the base for all processes and offers utility functions for setup and creating new streams. """ def __init__(self): super().__init__() self.context = None """The ØMQ :class:`~zmq.Context` instance.""" self.loop = None """PyZMQ's event loop (:class:`~zmq.eventloop.ioloop.IOLoop`).""" def setup(self): """ Creates a :attr:`context` and an event :attr:`loop` for the process. """ self.context = zmq.Context() self.loop = ioloop.IOLoop.instance() def stream(self, sock_type, addr, bind, callback=None, subscribe=b''): """ Creates a :class:`~zmq.eventloop.zmqstream.ZMQStream`. :param sock_type: The ØMQ socket type (e.g. ``zmq.REQ``) :param addr: Address to bind or connect to formatted as *host:port*, *(host, port)* or *host* (bind to random port). If *bind* is ``True``, *host* may be: - the wild-card ``*``, meaning all available interfaces, - the primary IPv4 address assigned to the interface, in its numeric representation or - the interface name as defined by the operating system. If *bind* is ``False``, *host* may be: - the DNS name of the peer or - the IPv4 address of the peer, in its numeric representation. If *addr* is just a host name without a port and *bind* is ``True``, the socket will be bound to a random port. :param bind: Binds to *addr* if ``True`` or tries to connect to it otherwise. :param callback: A callback for :meth:`~zmq.eventloop.zmqstream.ZMQStream.on_recv`, optional :param subscribe: Subscription pattern for *SUB* sockets, optional, defaults to ``b''``. :returns: A tuple containg the stream and the port number. """ sock = self.context.socket(sock_type) # addr may be 'host:port' or ('host', port) if isinstance(addr, str): addr = addr.split(':') host, port = addr if len(addr) == 2 else (addr[0], None) # Bind/connect the socket if bind: if port: sock.bind('tcp://%s:%s' % (host, port)) else: port = sock.bind_to_random_port('tcp://%s' % host) else: sock.connect('tcp://%s:%s' % (host, port)) # Add a default subscription for SUB sockets if sock_type == zmq.SUB: sock.setsockopt(zmq.SUBSCRIBE, subscribe) # Create the stream and add the callback stream = zmqstream.ZMQStream(sock, self.loop) if callback: stream.on_recv(callback) return stream, int(port)
PongProc - The Actual Process
The PongProc inherits ZmqProcess and is the main class for our process. It creates the streams, starts the event loop and dispatches all messages to the appropriate handlers:
# pongproc.py from zmq.utils import jsonapi as json import zmq import zmqproc host = '127.0.0.1' port = 5678 class PongProc(zmqproc.ZmqProcess): """ Main processes for the Ponger. It handles ping requests and sends back a pong. """ def __init__(self, bind_addr): super().__init__() self.bind_addr = bind_addr self.rep_stream = None # Make sure this is pickle-able (e.g., not using threads) # or it won't work on Windows. If it's not pickle-able, instantiate # it in setup(). self.ping_handler = PingHandler() def setup(self): """Sets up PyZMQ and creates all streams.""" super().setup() self.rep_stream, _ = self.stream(zmq.REP, self.bind_addr, bind=True, callback=self.handle_rep_stream) def run(self): """Sets up everything and starts the event loop.""" self.setup() self.loop.start() def stop(self): """Stops the event loop.""" self.loop.stop() def handle_rep_stream(self, msg): """ Handles messages from a Pinger: *ping* Send back a pong. *plzdiekthxbye* Stop the ioloop and exit. """ msg_type, data = json.loads(msg[0]) if msg_type == 'ping': rep = self.ping_handler.make_pong(data) self.rep_stream.send_json(rep) elif msg_type == 'plzdiekthxbye': self.stop() else: raise RuntimeError('Received unkown message type: %s' % msg_type)
There are a couple of things to note here:
-
I instantiated the PingHandler in the process' __init__ method. If you are going to start this process as a sub-process via start, make sure everything you instantiate in __init__ is pickle-able or it won't work on Windows (Linux and Mac OS X use fork to create a sub-process and fork just makes a copy of the main process and gives it a new process ID. On Windows, there is no fork and the context of your main process is pickled and sent to the sub-process).
-
In setup, call super().setup() before you create a stream or you won't have a loop instance for them. You don't call setup in the process' __init__, because the context must be created within the new system process. So we call setup in run.
-
The stop method is not really necessary in this example, but it can be used to send stop messages to sub-processes when the main process terminates and to do other kinds of clean-up. You can also execute it if you except a KeyboardInterrupt after calling run.
-
handle_rep_stream is the message dispatcher for the process' REP stream. It parses the message and calls the appropriate handler for that message (or raises an error if the message type is invalid). If your if and elif statements all do the same, you might consider replacing them with a dict that contains the handlers for each message type:
handlers = { 'msg': self.handler_for_msg, } try: rep = handlers[msg_type](data) self.rep_stream.send_multipart(rep) except KeyError: raise RuntimeError('Received unknown message.')
PingHandler - The Application Logic
The PingHandler contains the actual application logic (which is not much, in this example). The make_pong method just gets the number of pings sent with the ping message and creates a new pong message. The serialization is done by PongProc, so our Handler does not depend on PyZMQ:
class PingHandler(object): def make_pong(self, num_pings): """Creates and returns a pong message.""" print('Pong got request number %s' % num_pings) return ['pong', num_pings]
Summary
Okay, that's it for now. I showed you three ways to use PyZMQ. If you have a very simple process with only one socket, you can easily use its blocking recv methods. If you need more than one socket, I recommend using the event loop. And polling … you don't want to use that.
If you decide to use PyZMQ's event loop, you should separate the application logic from all the PyZMQ stuff (like creating streams, sending/receiving messages and dispatching them). If your application consists of more then one process (which is usually the case), you should also create a base class with shared functionality for them.
In the next part, I'm going to talk about how you can test your application.
04 Feb 2012 1:34pm GMT
Stefan Scherfke: Designing and Testing PyZMQ Applications – Part 1
ZeroMQ (or ØMQ or ZMQ) is an intelligent messaging framework and described as "sockets on steroids". That is, they look like normal TCP sockets but actually work as you'd expect sockets to work. PyZMQ adds even more convenience to them, which makes it a really a good choice if you want to implement a distributed application. Another big plus for ØMQ is that you can integrate sub-systems written in C, Java or any other language ØMQ supports (which are a lot).
If you've never heard of ØMQ before, I recommend to read ZeroMQ an Introduction by Nicholas Piël, before you go on with this article.
The ØMQ Guide and PyZMQ's documentation are really good, so you can easily get started. However, when we began to implement a larger application with it (a distributed simulation framework), several questions arose which were not covered by the documentation:
- What's the best way do design our application?
- How can we keep it readable, flexible and maintainable?
- How do we test it?
I didn't find something like a best practice article that answered my questions. So in this series of articles, I'm going to talk about what I've learned during the last months. I'm not a PyZMQ expert (yet ;-)), but what I've done so far works quite well and I never had more tests in a project than I do have now.
You'll find the source for the examples at bitbucket. They are written in Python 3.2 and tested under Mac OS X Lion, Ubuntu 11.10 and Windows 7, 64 bit in each case. If you have any suggestions or improvements, please fork me or just leave a comment.
In this first article, I'm going to talk a bit about how you could generally design your application to be flexible, maintainable and testable. The second part will be about unit testing and the finally, I'll cover process and system testing.
Comparison of Different Approaches
There are basically three possible ways to implement a PyZMQ application. One, that's easy, but limited in practical use, one that's more flexible, but not really pythonic and one, that needs a bit more setup, but is flexible and pythonic.
All three examples feature a simple ping process and a pong process with varying complexity. I use multiprocessing to run the pong process, because that's what you should usually do in real PyZMQ applications (you don't want to use threads and if both processes are running on the same machine, there's no need to invoke both of them separately).
All of the examples will have the following output:
(zmq)$ python blocking_recv.py Pong got request: ping 0 Ping got reply: pong 0 ... Pong got request: ping 4 Ping got reply: pong 4
Let's start with the easy one first. You just use on of the socket's recv methods in a loop:
# blocking_recv.py import multiprocessing import zmq addr = 'tcp://127.0.0.1:5678' def ping(): """Sends ping requests and waits for replies.""" context = zmq.Context() sock = context.socket(zmq.REQ) sock.bind(addr) for i in range(5): sock.send_unicode('ping %s' % i) rep = sock.recv_unicode() # This blocks until we get something print('Ping got reply:', rep) def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.connect(addr) for i in range(5): req = sock.recv_unicode() # This also blocks print('Pong got request:', req) sock.send_unicode('pong %s' % i) if __name__ == '__main__': pong_proc = multiprocessing.Process(target=pong) pong_proc.start() ping() pong_proc.join()
So this is very easy and no that much code. The problem with this is, that it only works well if your process only uses one socket. Unfortunately, in larger applications that is rather rarely the case.
A way to handle multiple sockets per process is polling. In addition to your context and socket(s), you need a poller. You also have to tell it which events on which socket you are going to poll:
# polling.py def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.bind(addr) # Create a poller and register the events we want to poll poller = zmq.Poller() poller.register(sock, zmq.POLLIN|zmq.POLLOUT) for i in range(10): # Get all sockets that can do something socks = dict(poller.poll()) # Check if we can receive something if sock in socks and socks[sock] == zmq.POLLIN: req = sock.recv_unicode() print('Pong got request:', req) # Check if we cann send something if sock in socks and socks[sock] == zmq.POLLOUT: sock.send_unicode('pong %s' % (i // 2)) poller.unregister(sock)
You see, that our pong function got pretty ugly. You need 10 iterations to do five ping-pongs, because in each iteration you can either send or reply. And each socket you add to your process adds two more if-statements. You could improve that design if you created a base class wrapping the polling loop and just register sockets and callbacks in an inheriting class.
That brings us to our final example. PyZMQ comes with with an adapted Tornado eventloop that handles the polling and works with ZMQStreams, that wrap sockets and add some functionality:
# eventloop.py from zmq.eventloop import ioloop, zmqstream class Pong(multiprocessing.Process): """Waits for ping requests and replies with a pong.""" def __init__(self): super().__init__() self.loop = None self.stream = None self.i = 0 def run(self): """ Initializes the event loop, creates the sockets/streams and starts the (blocking) loop. """ context = zmq.Context() self.loop = ioloop.IOLoop.instance() # This is the event loop sock = context.socket(zmq.REP) sock.bind(addr) # We need to create a stream from our socket and # register a callback for recv events. self.stream = zmqstream.ZMQStream(sock, self.loop) self.stream.on_recv(self.handle_ping) # Start the loop. It runs until we stop it. self.loop.start() def handle_ping(self, msg): """Handles ping requests and sends back a pong.""" # req is a list of byte objects req = msg[0].decode() print('Pong got request:', req) self.stream.send_unicode('pong %s' % self.i) # We'll stop the loop after 5 pings self.i += 1 if self.i == 5: self.stream.flush() self.loop.stop()
This even adds more boilerplate code, but it will pay of if you use more sockets and most of that stuff in run() can be put into a base class. Another drawback is, that the IOLoop only uses recv_multipart(). So you always get a lists of byte strings which you have to decode or deserialize on your own. However, you can use all the send methods socket offers (like send_unicode() or send_json()). You can also stop the loop from within a message handler.
In the next sections, I'll discuss how you could implement a PyZMQ process that uses the event loop.
Communication Design
Before you start to implement anything, you should think about what kind of processes you need in your application and which messages they exchange. You should also decide what kind of message format and serialization you want to use.
PyZMQ has built-in support for Unicode (send sends plain C strings which map to Python byte objects, so there's a separate method to send Unicode strings), JSON and Pickle.
JSON is nice, because it's fast and lets you integrate processes written in other languages into you application. It's also a bit safer, because you cannot receive arbitrary objects as with pickle. The most straightforward syntax for JSON messages is to let them be triples [msg_type, args, kwargs], where msg_type maps to a method name and args and kwargs get passed as positional and keyword arguments.
I strongly recommend you to document each chain of messages your application sends to perform a certain task. I do this with fancy PowerPoint graphics and with even fancier ASCII art in Sphinx. Here is how I would document our ping-pong:
Sending pings ------------- * If the ping process sends a *ping*, the pong processes responds with a *pong*. * The number of pings (and pongs) is counted. The current ping count is sent with each message. :: PingProc PongProc [REQ] ---1--> [REP] <--2--- 1 IN : ['ping, count'] 1 OUT: ['ping, count'] 2 IN : ['pong, count'] 2 OUT: ['pong, count']
First, I write some bullet points that explain how the processes behave and why they behave this way. This is followed by some kind of sequence diagram that shows when which process sents which message using which socket type. Finally, I write down how the messages are looking. # IN is what you would pass to send_multipart and # OUT is, what is received on the other side by recv_multipart. If one of the participating sockets is a ROUTER or DEALER, IN and OUT will differ (though that's not the case in this example). Everything in single quotation marks (') represents a JSON serialized list.
If our pong process used a ROUTER socket instead of the REP socket, it would look like this:
1 IN : ['ping, count'] 1 OUT: [ping_uuid, '', 'ping, count'] 2 IN : [ping_uuid, '', 'pong, count'] 2 OUT: ['pong, count']
This seems like a lot of tedious work, but trust me, it really helps a lot when you need to change something a few weeks later!
Application Design
In the examples above, the Pong process was responsible for setting everything up, for receiving/sending messages and for the actual application logic (counting incoming pings and creating a pong).
Obviously, this is not a very good design. What we can do about this is to put most of that nasty setup stuff into a base class which all your processes can inherit from, and to put all the actual application logic into a separate (PyZMQ independent) class.
ZmqPocess - The Base Class for all Processes
The base class basically implements two things:
- a setup method that creates a context an a loop
- a stream factory method for streams with a on_recv callback. It creates a socket and can connect/bind it to a given address or bind it to a random port (that's why it returns the port number in addition to the stream itself).
It also inherits multiprocessing.Process so that it is easier to spawn it as sub-process. Of course, you can also just call its run() method from you main().
# zmqproc.py import multiprocessing from zmq.eventloop import ioloop, zmqstream import zmq class ZmqProcess(multiprocessing.Process): """ This is the base for all processes and offers utility functions for setup and creating new streams. """ def __init__(self): super().__init__() self.context = None """The ØMQ :class:`~zmq.Context` instance.""" self.loop = None """PyZMQ's event loop (:class:`~zmq.eventloop.ioloop.IOLoop`).""" def setup(self): """ Creates a :attr:`context` and an event :attr:`loop` for the process. """ self.context = zmq.Context() self.loop = ioloop.IOLoop.instance() def stream(self, sock_type, addr, bind, callback=None, subscribe=b''): """ Creates a :class:`~zmq.eventloop.zmqstream.ZMQStream`. :param sock_type: The ØMQ socket type (e.g. ``zmq.REQ``) :param addr: Address to bind or connect to formatted as *host:port*, *(host, port)* or *host* (bind to random port). If *bind* is ``True``, *host* may be: - the wild-card ``*``, meaning all available interfaces, - the primary IPv4 address assigned to the interface, in its numeric representation or - the interface name as defined by the operating system. If *bind* is ``False``, *host* may be: - the DNS name of the peer or - the IPv4 address of the peer, in its numeric representation. If *addr* is just a host name without a port and *bind* is ``True``, the socket will be bound to a random port. :param bind: Binds to *addr* if ``True`` or tries to connect to it otherwise. :param callback: A callback for :meth:`~zmq.eventloop.zmqstream.ZMQStream.on_recv`, optional :param subscribe: Subscription pattern for *SUB* sockets, optional, defaults to ``b''``. :returns: A tuple containg the stream and the port number. """ sock = self.context.socket(sock_type) # addr may be 'host:port' or ('host', port) if isinstance(addr, str): addr = addr.split(':') host, port = addr if len(addr) == 2 else (addr[0], None) # Bind/connect the socket if bind: if port: sock.bind('tcp://%s:%s' % (host, port)) else: port = sock.bind_to_random_port('tcp://%s' % host) else: sock.connect('tcp://%s:%s' % (host, port)) # Add a default subscription for SUB sockets if sock_type == zmq.SUB: sock.setsockopt(zmq.SUBSCRIBE, subscribe) # Create the stream and add the callback stream = zmqstream.ZMQStream(sock, self.loop) if callback: stream.on_recv(callback) return stream, int(port)
PongProc - The Actual Process
The PongProc inherits ZmqProcess and is the main class for our process. It creates the streams, starts the event loop and dispatches all messages to the appropriate handlers:
# pongproc.py from zmq.utils import jsonapi as json import zmq import zmqproc host = '127.0.0.1' port = 5678 class PongProc(zmqproc.ZmqProcess): """ Main processes for the Ponger. It handles ping requests and sends back a pong. """ def __init__(self, bind_addr): super().__init__() self.bind_addr = bind_addr self.rep_stream = None # Make sure this is pickle-able (e.g., not using threads) # or it won't work on Windows. If it's not pickle-able, instantiate # it in setup(). self.ping_handler = PingHandler() def setup(self): """Sets up PyZMQ and creates all streams.""" super().setup() self.rep_stream, _ = self.stream(zmq.REP, self.bind_addr, bind=True, callback=self.handle_rep_stream) def run(self): """Sets up everything and starts the event loop.""" self.setup() self.loop.start() def stop(self): """Stops the event loop.""" self.loop.stop() def handle_rep_stream(self, msg): """ Handles messages from a Pinger: *ping* Send back a pong. *plzdiekthxbye* Stop the ioloop and exit. """ msg_type, data = json.loads(msg[0]) if msg_type == 'ping': rep = self.ping_handler.make_pong(data) self.rep_stream.send_json(rep) elif msg_type == 'plzdiekthxbye': self.stop() else: raise RuntimeError('Received unkown message type: %s' % msg_type)
There are a couple of things to note here:
-
I instantiated the PingHandler in the process' __init__ method. If you are going to start this process as a sub-process via start, make sure everything you instantiate in __init__ is pickle-able or it won't work on Windows (Linux and Mac OS X use fork to create a sub-process and fork just makes a copy of the main process and gives it a new process ID. On Windows, there is no fork and the context of your main process is pickled and sent to the sub-process).
-
In setup, call super().setup() before you create a stream or you won't have a loop instance for them. You don't call setup in the process' __init__, because the context must be created within the new system process. So we call setup in run.
-
The stop method is not really necessary in this example, but it can be used to send stop messages to sub-processes when the main process terminates and to do other kinds of clean-up. You can also execute it if you except a KeyboardInterrupt after calling run.
-
handle_rep_stream is the message dispatcher for the process' REP stream. It parses the message and calls the appropriate handler for that message (or raises an error if the message type is invalid). If your if and elif statements all do the same, you might consider replacing them with a dict that contains the handlers for each message type:
handlers = { 'msg': self.handler_for_msg, } try: rep = handlers[msg_type](data) self.rep_stream.send_multipart(rep) except KeyError: raise RuntimeError('Received unknown message.')
PingHandler - The Application Logic
The PingHandler contains the actual application logic (which is not much, in this example). The make_pong method just gets the number of pings sent with the ping message and creates a new pong message. The serialization is done by PongProc, so our Handler does not depend on PyZMQ:
class PingHandler(object): def make_pong(self, num_pings): """Creates and returns a pong message.""" print('Pong got request number %s' % num_pings) return ['pong', num_pings]
Summary
Okay, that's it for now. I showed you three ways to use PyZMQ. If you have a very simple process with only one socket, you can easily use its blocking recv methods. If you need more than one socket, I recommend using the event loop. And polling … you don't want to use that.
If you decide to use PyZMQ's event loop, you should separate the application logic from all the PyZMQ stuff (like creating streams, sending/receiving messages and dispatching them). If your application consists of more then one process (which is usually the case), you should also create a base class with shared functionality for them.
In the next part, I'm going to talk about how you can test your application.
04 Feb 2012 1:34pm GMT
Daniel Greenfeld: Resolutions for 2012
- Go to a Python related conference in North America, South America, Europe, Asia, Africa, Australia, and New Zealand.
- Attend at least one JavaScript related conference or event.
- Upload all my outstanding pictures to Flickr!
- Make Consumer Notebook profitable.
- Find more ways to make Audrey Roy happy.
- Pull off an Aú sem Mão during a Capoeira Roda.
- Attend my first Capoeira Batizado.
- See a place in the USA I've never been.
- Work out at least three times a week.
- Drop to a 32 waist
- Visit friends and family back east. Been over a year since I've seen my sister!
- Blog once a week. That is at least 52 blog entries!
- Visit a Theme park.
- Learn how to surf or snowboard.
- Implement something in node.js, backbone.js, and handlebars.js
- Take a high level Python class from the likes of Raymond Hettiger or David Beazly.
- Teach some Python or Django.
- Have a beer with Thomas, Andy, Andy, Tony, Garrick, Bernd, and the rest of Ye Aulde Gange.
- See my old DC area friends such as Eric, Chris, Steve, Beth, Sarah, Daye, Renee, Kenneth, Leslie, Whitney, Dave, and many others.
- Visit my Son.
04 Feb 2012 6:59am GMT
Daniel Greenfeld: Resolutions for 2012
- Go to a Python related conference in North America, South America, Europe, Asia, Africa, Australia, and New Zealand.
- Attend at least one JavaScript related conference or event.
- Upload all my outstanding pictures to Flickr!
- Make Consumer Notebook profitable.
- Find more ways to make Audrey Roy happy.
- Pull off an Aú sem Mão during a Capoeira Roda.
- Attend my first Capoeira Batizado.
- See a place in the USA I've never been.
- Work out at least three times a week.
- Drop to a 32 waist
- Visit friends and family back east. Been over a year since I've seen my sister!
- Blog once a week. That is at least 52 blog entries!
- Visit a Theme park.
- Learn how to surf or snowboard.
- Implement something in node.js, backbone.js, and handlebars.js
- Take a high level Python class from the likes of Raymond Hettiger or David Beazly.
- Teach some Python or Django.
- Have a beer with Thomas, Andy, Andy, Tony, Garrick, Bernd, and the rest of Ye Aulde Gange.
- See my old DC area friends such as Eric, Chris, Steve, Beth, Sarah, Daye, Renee, Kenneth, Leslie, Whitney, Dave, and many others.
- Visit my Son.
04 Feb 2012 6:59am GMT
03 Feb 2012
Planet Python
Mike C. Fletcher: What to play with?
I'm hoping to have a few weeks to work on my own projects before I dive into working on other people's projects again (that might not pan out, but I'm hoping), so, here's a brain-dump of what I'm considering playing with:
- write a simple, generic shader-based legacy-free scenegraph engine (basically transplant the modern parts from OpenGLContext and leave behind the old crud, then translate the core into C/C++)
- turn Sillescope into an Android app (that shouldn't take too long, I just got annoyed at the GLES limitations last time and stopped 1/2 way)
- learn Haskell (though the "Haskell for Python Programmers" article honestly left me thinking "this is dumb" much of the time)
- contribute to a game engine (maybe Ogre, maybe 0AD)
- add a GLES binding to PyOpenGL
- play with PyPy now that I have a machine that compile it
- build a basic HTML5 Canvas or WebGL game engine
- update and modernize StarPy (I think no, as I have spent the last 18 months on VoIP and Django)
- update Django-jqm with latest JQuery Mobile, provide a JQM admin interface (again, not likely, just spent the last 18 months in Django)
Any other suggestions? I'm not currently concerned about utility or practicality, just fun things with which to spend a few weeks to recharge my programming-enjoyment batteries.
03 Feb 2012 9:02pm GMT
Mike C. Fletcher: What to play with?
I'm hoping to have a few weeks to work on my own projects before I dive into working on other people's projects again (that might not pan out, but I'm hoping), so, here's a brain-dump of what I'm considering playing with:
- write a simple, generic shader-based legacy-free scenegraph engine (basically transplant the modern parts from OpenGLContext and leave behind the old crud, then translate the core into C/C++)
- turn Sillescope into an Android app (that shouldn't take too long, I just got annoyed at the GLES limitations last time and stopped 1/2 way)
- learn Haskell (though the "Haskell for Python Programmers" article honestly left me thinking "this is dumb" much of the time)
- contribute to a game engine (maybe Ogre, maybe 0AD)
- add a GLES binding to PyOpenGL
- play with PyPy now that I have a machine that compile it
- build a basic HTML5 Canvas or WebGL game engine
- update and modernize StarPy (I think no, as I have spent the last 18 months on VoIP and Django)
- update Django-jqm with latest JQuery Mobile, provide a JQM admin interface (again, not likely, just spent the last 18 months in Django)
Any other suggestions? I'm not currently concerned about utility or practicality, just fun things with which to spend a few weeks to recharge my programming-enjoyment batteries.
03 Feb 2012 9:02pm GMT
Juho Vepsäläinen: Blog Highlights of '11
It looks like this year is nearing its end. Thanks for tagging along! I thought it might be fun to write a post that highlights some of the nicer posts I wrote this year. So far I've been blogging around two and a half years.
I think blogging is slowly starting show its advantages. Just a while ago I needed to solve certain Django specific problem. After googling around I happened to find the solution at my blog. In essence this blog serves as a kind of auxiliary memory of mine. As a side benefit some other people might find my ramblings useful too. This in turn might lead to new chances. Blogging is definitely a good way to market yourself if you're into that sort of thing.
There has been some talk on whether or not blogging is dying. The basic premise is that social mediums such as Facebook and Twitter are eating its popularity. That's probably partially true. I believe blogs will remain to have some influence. After all you'll need something to discuss and tweet about. Most importantly blogs are more permanent by nature. It's easier to refer back to some concrete blog post than some obscure Twitter conversation ages ago. Different mediums serve different purposes.
Now that I got the intro bit out of the way, let's take a look at the year. Quite a few things happened. While at it I'll try to outline some possible ideas for the next one. It's not like I'm running out of ideas. On the contrary. There's still plenty of material left I need to get out there sooner or later.
Read more »
03 Feb 2012 8:45pm GMT
Juho Vepsäläinen: Blog Highlights of '11
It looks like this year is nearing its end. Thanks for tagging along! I thought it might be fun to write a post that highlights some of the nicer posts I wrote this year. So far I've been blogging around two and a half years.
I think blogging is slowly starting show its advantages. Just a while ago I needed to solve certain Django specific problem. After googling around I happened to find the solution at my blog. In essence this blog serves as a kind of auxiliary memory of mine. As a side benefit some other people might find my ramblings useful too. This in turn might lead to new chances. Blogging is definitely a good way to market yourself if you're into that sort of thing.
There has been some talk on whether or not blogging is dying. The basic premise is that social mediums such as Facebook and Twitter are eating its popularity. That's probably partially true. I believe blogs will remain to have some influence. After all you'll need something to discuss and tweet about. Most importantly blogs are more permanent by nature. It's easier to refer back to some concrete blog post than some obscure Twitter conversation ages ago. Different mediums serve different purposes.
Now that I got the intro bit out of the way, let's take a look at the year. Quite a few things happened. While at it I'll try to outline some possible ideas for the next one. It's not like I'm running out of ideas. On the contrary. There's still plenty of material left I need to get out there sooner or later.
Read more »
03 Feb 2012 8:45pm GMT
Lightning Fast Shop: Release 0.6.5
We just released LFS 0.6.5. This is a yet another bugfix release.
Changes
- Bugfix: added csrftoken for rating mails (Maciej Wi?niowski)
- Bugfix: fixed ImageWithThumbsField (Maciej Wi?niowski)
- Updated romanian translations (olimpiu)
- Updated polish translations (Maciej Wi?niowski)
News:
- We have setup a GitHub mirror of LFS.
- The docs are running on our own domain now (still hosted on RTD) and have a new layout: http://docs.getlfs.com/
Information
You can find more information and help on following locations:
03 Feb 2012 6:45pm GMT
Lightning Fast Shop: Release 0.6.5
We just released LFS 0.6.5. This is a yet another bugfix release.
Changes
- Bugfix: added csrftoken for rating mails (Maciej Wi?niowski)
- Bugfix: fixed ImageWithThumbsField (Maciej Wi?niowski)
- Updated romanian translations (olimpiu)
- Updated polish translations (Maciej Wi?niowski)
News:
- We have setup a GitHub mirror of LFS.
- The docs are running on our own domain now (still hosted on RTD) and have a new layout: http://docs.getlfs.com/
Information
You can find more information and help on following locations:
03 Feb 2012 6:45pm GMT
PyCon: PyCon US 2012: You want hotel? We have hotels!
As noted in the previous post - we had a minor blip regarding the PyCon 2012 hotel - by minor blip, I mean we completely booked the Hyatt (our main hotel).
This issue has been resolved without needing me to resort to going to Home Depot and buying "The Dummies Guide to Hotel Building".
We now have plenty of rooms at:
Update 2/3/12:
Hilton Santa Clara - NOW FULL within walking distance of the venue. These rooms are marked as 159$/night - however, as we want to do whats right for attendees, we have asked the Hilton to credit each room night booked under out block 10$, while the PyCon master account will absorb the 10$ additional cost. This means that the base room rate for attendees will be 149$/night, matching the cost for the Hyatt.
The Avatar Hotel, Santa Clara - STILL AVAILABLE this one is not within walking distance, however the cost per night is 149$/night - matching our other rates, and we have negotiated a free shuttle for attendees to and from the Santa Clara Convention Center.
The Marriott Santa Clara - STILL AVAILABLE again, maintaining the room night cost, and while it too is not within walking distance, we will have a free shuttle to and from the convention center!
I must note: All of these agreements include room minimums - this means that PyCon will get charged a lot of money if we do not book the blocks we have contracted for.
In order for us to get credit for your hotel bookings, you must book through our registration and housing system at: https://us.pycon.org/2012/registration/register/ - or by contacting our housing bureau at pycon5-reg@cteusa.com or by phone at 847-759-4277.
Please book your rooms through us, and please book as soon as you can! You'll not only get a room for the conference - you'll help us out, and be a part of what is already the biggest Pycon on record. If you haven't registered? You need to - registration is capped at 1500 attendees, and by all estimates, we are going to hit that number and soon. (All financial aid recipients are accounted for in attendance and hotel, by the way.)
Jesse Noller, PyCon Chair.
03 Feb 2012 3:29pm GMT
PyCon: PyCon US 2012: You want hotel? We have hotels!
As noted in the previous post - we had a minor blip regarding the PyCon 2012 hotel - by minor blip, I mean we completely booked the Hyatt (our main hotel).
This issue has been resolved without needing me to resort to going to Home Depot and buying "The Dummies Guide to Hotel Building".
We now have plenty of rooms at:
Update 2/3/12:
Hilton Santa Clara - NOW FULL within walking distance of the venue. These rooms are marked as 159$/night - however, as we want to do whats right for attendees, we have asked the Hilton to credit each room night booked under out block 10$, while the PyCon master account will absorb the 10$ additional cost. This means that the base room rate for attendees will be 149$/night, matching the cost for the Hyatt.
The Avatar Hotel, Santa Clara - STILL AVAILABLE this one is not within walking distance, however the cost per night is 149$/night - matching our other rates, and we have negotiated a free shuttle for attendees to and from the Santa Clara Convention Center.
The Marriott Santa Clara - STILL AVAILABLE again, maintaining the room night cost, and while it too is not within walking distance, we will have a free shuttle to and from the convention center!
I must note: All of these agreements include room minimums - this means that PyCon will get charged a lot of money if we do not book the blocks we have contracted for.
In order for us to get credit for your hotel bookings, you must book through our registration and housing system at: https://us.pycon.org/2012/registration/register/ - or by contacting our housing bureau at pycon5-reg@cteusa.com or by phone at 847-759-4277.
Please book your rooms through us, and please book as soon as you can! You'll not only get a room for the conference - you'll help us out, and be a part of what is already the biggest Pycon on record. If you haven't registered? You need to - registration is capped at 1500 attendees, and by all estimates, we are going to hit that number and soon. (All financial aid recipients are accounted for in attendance and hotel, by the way.)
Jesse Noller, PyCon Chair.
03 Feb 2012 3:29pm GMT
31 Jan 2012
Python Software Foundation | GSoC'11 Students
Benedict Stein: Xhosa in Debian
Today i had to add a missing US locale which brought up the idea of trying Xhosa as well.

31 Jan 2012 12:55pm GMT
Wojciech Wojtyniak: 137
It was Richard Feynman, in fact, who suggested that all physicists should put up a sign in their offices or homes to remind them of how much we don't know. The sign would say simply 137. One hundred and thirty-seven is the inverse of something called the fine-structure constant. This number is related to the probability that an electron will emit or absorb a photon. The fine-structure constant also answers to the name alpha, and it can be arrived at by taking the square of the charge of the electron divided by the speed of light times Planck's constant. What all that verbiage means is that this one number, 137, contains the crux of electromagnetism (the electron), relativity (the velocity of light), and quantum theory (Planck's constant). It would be less unsettling if the relationship between all these important concepts turned out to be one or three or maybe a multiple of pi. But 137?
I tell my undergraduate students that if they are ever in trouble in a major city anywhere in the world they should write "137" on a sign and hold it up at a busy street corner. Eventually a physicist will see that they're distressed and come to their assistance. (No one to my knowledge has ever tried this, but it should work.)
Leon Lederman, "The god particle: if the universe is the answer, what is the question?"
![]() |
| Richard Feymann |
So here we are, 0x89.net.
Welcome!
31 Jan 2012 12:40am GMT
30 Jan 2012
Python Software Foundation | GSoC'11 Students
Benedict Stein: Renaming Network Devices in Linux
Reinstalling / Cloning the X2GoServer for Lumanyano Primary School i had to rename the network devices to fit the available configuration.
Our Preinstalled Network Settings offer the PXE Boot Server on eth0 - which should be the onboard RJ45 Plug. In addition eth1 is configured for static internet usage, eth2 as a DHCP client for internet and WLAN for static Internet.
Inserting an Intel NetBios Network Card this one got eth0 - but the change is simple - see screenshot on my private PC below - just change the name eth0 → eth1 ...
30 Jan 2012 12:37pm GMT
29 Jan 2012
Python Software Foundation | GSoC'11 Students
Vlad Niculae: Nash-Williams theorem on the Hamiltonian property of some regular graphs
I have been digging on the internet for the proof of this theorem for the last couple of days without success. The result was published by Sir Crispin Nash-Williams as Valency Sequences which force graphs to have Hamiltonian Circuits. Interim Rep, University of Waterloo Res Rep., Waterloo, Ontario, 1969. However, this old paper is unavailable online but I have a proof in some lecture notes from my class, that I want to share here.
Theorem. Let \(G=(V, E)\) be an \(n\)-regular graph with \(|V| = 2n + 1\). Then, \(G\) is Hamiltonian.
Proof. We first remark that \(n\) must be even, since $$\sum_{x \in V} d(x) = n(2n + 1) = 2|E|$$ We might try to apply Dirac's theorem, which would give us a Hamiltonian cycle if \( \forall x \in V, d(x) \geq \frac{|V|}{2}\). But in the current case, \(\forall x \in V, d(x) = n \frac{2n+1}{2}\).
So we force Dirac by adding an extra vertex \(w\) and connecting it to all \( x \in V \). In this new graph \(G'\), \(d(x) = n + 1 \forall x \in V\) and \(d(w) = 2n + 1\). Therefore we have a Hamiltonian cycle that passes through \(w\) and in which, \(w\) is adjacent to two vertices \(x\) and \(y \in V\). Therefore this cycle induces a Hamiltonian path in \(G\): $$P = [x = v_0, v_1, ..., v_{2n-1}, v_{2n}=y] $$
Suppose that \(G\) is not Hamiltonian. It follows that if \( v_0v_i \in E \), then \( v_{i-1}v_{2n} \notin E\) and also that if \( v_0v_i \notin E \), then \( v_{i-1}v_{2n} \in E\).
We have two cases. If \(v_0\) is adjacent to \(v_1, ..., v_n\) then it follows that \(v_{2n}\) is adjacent to \(v_n, v_{n+1}, ..., v_{2n-1}\), since it cannot be adjacent to any \(v_i, i n\) without creating a Hamiltonian cycle. But in this case, in the graph induced by the first half \(G[\{v_0, v_1, ... v_n\}]\), \(v_n\) cannot be adjacent to all the others, since in \(G\) it has degree \(n\) and it already has \(2\) outgoing edges. So there is at least one vertex \(v_i, i n\) that isn't adjacent to it, which means \(v_i\) is adjacent to some \(v_j, j > n\), thus forming a Hamiltonian cycle.
In the second case, we have a vertex \(v_i, 2 \leq i \leq 2n - 1\) such that \(v_0v_i \notin E\) and \(v_0v_{i+1} \in E\). This also means that \(v_{i-1}v_{2n} \in E\).
We therefore have a cycle of length \(2n\) in \(G\) that excludes \(v_i\). Let's rename this cycle \(C=[y_1, y_2, ..., y_{2n}, y_1]\) and \(v_i=y_0\).
\(y_0\) cannot be adjacent to two consecutive vertices \(y_i\) and \(y_{i+1}\) because this will give a Hamiltonian cycle. But we know that \(deg(y_0) = n\). It follows that it's adjacent to all of the even or odd numbered vertices. We assume the latter, without loss of generality. Let \(2k\) be some even index. Notice that we have \(\{y_0y_{2k-1}, y_0y_{2k+1}\} \subset E\) and we can follow the cycle \(C\) from \(y_{2k+1}\) all the way back to \(y_{2n-1}\) giving us a new cycle \(C' = [y_1, y_2, ..., y_{2n-1}, y_0, y_{2k+1}, ..., y_{2n}, y_1]\) also of length \(2n\). So by repeating the same reasoning for every even vertex, by placing it in the middle and building a cycle around it, it follows that every even vertex is adjacent to all the odd vertices. But there are \(n+1\) even indices, so it follows that the degree of any odd vertex is at least \(n+1\), contradicting the initial conditions of the theorem. \(\square\)
29 Jan 2012 8:39pm GMT
23 Jan 2012
Python Software Foundation | GSoC'11 Students
Benedict Stein: Wireless network
Today Nozuko learned about wireless networks for her first time. Wpa 802.11 ISR configuration that's all new to her, but essential before we proceed with internet sharing solutions which partly depend on wireless bridges and a 3g modem.

23 Jan 2012 1:53pm GMT
18 Jan 2012
Python Software Foundation | GSoC'11 Students
Sara Kazemi: Hooray!
I've been taking 20mg of Lexapro now for a few days and I feel so much better. This is despite being more stressed lately due to the anticipation of teaching for the first time (by the way, the first day went great). Hooray! I also got in for free counseling with one of the therapists at school. Things are looking up!
18 Jan 2012 9:54pm GMT
Benedict Stein: Hilltop Hardware Donations
For those of you who don't know who I'm working for, I've created a small presentation showing some parts around Hilltop Empowerment Center. If someone wants to use this presentation to present it, embed it somewhere else or share it, feel free to do so, but take care of the "© benste CC NC SA" which is explained in the footer of my Blog.
18 Jan 2012 11:46am GMT
Benedict Stein: Creating and restoring partitons with partimage
Our idea
- Use partimage to run a command which will create and or restore a image of a local partition on a different local partition
- Move this process to be stored on a local server and accessible for many PCs in one room
Benefits of using partimage
Can easily restore images on client computers locally or from a server, which could reverse damage caused by viruses if properly executed
Run Debian Rescue - a system which is preinstalled on our computers and has got partimage installed
Login into the Shell
Execute the following to create a new Image
-z = compression from 0 to 2-d = Don't ask for any description of the image file-o = overwrite existing files without confirmation-b = Batch Mode - GUI won't wait for user interaction=(e.g. hda1)=(e.g. winxp_lynx)
Restoring an image of a local partition on a different local partition
Execute the following command
18 Jan 2012 10:24am GMT


