Thursday, September 6, 2012

EuroSciPy 2012 in Brussels - A Special Conference

A Great Conference
At the End of August 2012 the EuroSciPy conference took place in Brussels, Belgium. This was edition number 5 after two conferences in Leipzig, Germany in 2008 and 2009 and the other two in Paris, France in 2010 and 2011. Next year, this event will be again in Brussels.

The whole conference took four days: two days of tutorials followed by two days of talks.1 This is quite unique for a conference where usually the talk part is predominant. In fact, there was roughly twice as much tutorial time as talk time because two tutorial sessions proceeded in parallel, an introductory track and an advanced track. The tutorials were great. You get lots of hands-on training in a very short period of time.

Tutorial Day 1 (Advanced only)
As the first thing on Thursday, I gave a tutorial about combining Cython and NumPy and going parallel with OpenMP. This is essentially an abstract of a one-day course on this subject compressed into one hour and fifteen minutes. Admittedly, it is not the simplest topic, especially if you are new to Cython and, even more so, to NumPy. But after all, it was in the advanced tack and the 100+ people in the room must have got something out of it. At least the feedback as good.

The rest of the two tutorial days I could enjoy the advanced tutorials track. In the second tutorial of the morning, Francesc Alted talked about "Numexpr, Blosc and CArray" and made clear that the bottleneck is often not the CPU but rather how fast you can get your data from memory to the CPU. Caching and memory layout are very important. All the horse power won't help if you have to wait at red lights all the time. CArray is a chunked array that allows compression. It can save a lot of memory if the array is large and rather regular, i.e. repeated numbers. Blosc for compression and Numexpr for the calculation make it really fast, even faster than contiguous NumPy arrays for some special cases. Furthermore, it also allows for disked-based arrays just in case your memory is not big enough for your data but you still want to treat it as if it would be all in memory.

In the afternoon, Ian Ozsvald gave a very nice overview how to do parallel computing in Python Parallel computing with Multiprocessing, ParallelPython and Ipython. He has a lot of experience in this field. Since his focus was on approaches for distributed computing, it was a good complement to my tutorial about shared memory computations with OpenMP.

Optimization from the SciPy perspective was the topic of the tutorial by Gaël Varoquaux Better numerics with SciPy. This was very interesting because it focussed on the science but also made clear what the SciPy package can do and what is still missing. Optimization is general enough to be appealing to many people at the conference.

Tutorial Day 2 (Advanced only)
How to make writing a GUI program simple was the topic of the tutorial Enaml is not a Markup Language by Didrik Pinte. It looks a bit like YAML but allows to create powerful GUIs for wxPython and PySide/PyQT backends. There are a lot of goodies in it that help to solve common problems like two-way communication between data and widgets with very little code, I mean markup.

Working with data, especially Big Data, in Python? Then pandas is a must. Wes McKinney showed the most prominent feature of his library in his tutorial Time Series Data Analysis with Pandas. It is impressive how easy it is to work with dates and the always so nasty missing values. Wes created a great piece of software and even wrote a book about it.

Pietro Berkes gave a good introduction to the unittest module from the standard library in his tutorial Writing robust scientific code with testing (and Python). Personally, I prefer py.test, which I think is way more powerful, simpler to use and more pythonic. He argues that it is good to use modules you don't have to install. But scientific users are used to installing third party libraries all the time and compiling C extensions is a piece of cake for them. So a pure Python package that supports a wide range of Python versions and many Python implementations should be just a pip install away.

A new perspective on packaging was presented by David Cournapeau in his tutorial Bento, a pythonic packaging system for python software. Taking into account the special requirements of scientific Python packages and incorporating existing approaches such as pypi, pip and virtualenv, this looks like a good solution for many packaging needs.

Keynotes
If you ever heard a presentation by David Beazley then you know that you will learn something new, get your dose of diabolic things you can do just for the fun of it and will have a great time as the talk will be as entertaining as a technical presentation can get. In his keynote on Saturday Rethinking Extension Programming he went back in the early 90s during his time as researcher and talked about his work with physics simulation programs on parallel machines. While the execution of the code was fast, the workflow for pre and post processing was horribly inefficient so that he essentially developed his own scripting language just to learn that Python had already existed. ;) As a lesson from his work on SWIG he does not think it is a good idea to wrap Python around C, C++ or FORTRAN, where Python is just the little language that makes the "real" code a bit easier to use. Just the opposite should be the case. Python is the real language and the compiled languages are just helpers. The LLVM that can be used from Python is doing just this. From Python you write native code that gets translated on the fly to machine instructions. So Python is in the lead and the compiled code follows. He gave a live coding demo which worked so well that he had a hard time to provoke a segfault just to show that you are working without the Python security net here. Overall it was a great enjoyment to attend his talk and not as diabolic as I expected.

The second keynote on Sunday by Eric Jones was about Making the Case for Python (note: I made this title up because I don't remember the exact title and cannot find it on the net). He spoke from the perspective of a leader of a software company with scientific background who "sells" Python to large institutions and companies. He enumerated common arguments against Python like it is too slow, the GIL, who uses Python, no typing etc. and gave examples how to answer to such questions. While the arguments are all too common to many long-term Python users, the experience from about 10 years Python in the cooperate world shows some new insights. As a side effect, it turned out that the audience of this conference was about 50% academia, 25 % large companies, and 25 % small companies and freelancers.

Talks Day 1
While the tutorials are about more general topics that can be useful for the daily work of a majority of the participants, the talks can become quite specialized. After all, experts from very different disciplines such as physics, biology, geoscience, engineering, finance or linguistics will talk about their latest research findings. I guess, most of people already have a Ph.D., work on their degree or plan to do so.

The first talk scikits-image: Image processing in Python by Stéfan van der Walt introduced this impressive image processing library. In short, it is not about getting rid of red eyes in party fotos but about getting information for scientific purposes out of images. Almar Klein presented an new 3D library in pure Python Visvis - an object oriented approach to visualization. If you don't need the full power of MayaVi, this might be for you. Alexandros Kanterakis showed how to apply wikipedia principles to source code writing in his talk PyPedia: A crowdsourcing python online IDE for open and reproducible science. Rickard Holmberg gave a nice example for using IronPython including the associated problems in the talks IronPython scripting in a radiation therapy treatment planning system. Fortunately, Python is not nearly as complex as a schematic of cellular processes, but you can still simulate them with Python as Johann Rohwer showed in his talk PySCeS: the Python Simulator for Cellular Systems. Brett Olivier followed up using PySCeS together with other Python tools in his talk Pathway and Cells: Systems Biology Modelling with Python. Machines can learn but they do this much differently from humans. There is a lot of powerful tools available in this area from Python as Jaques Grobler made clear in his talk New developments with Scikit-learn: machine learning in Python. Electric cars are the future and they need to be charged regularly. Simulating this with Python helps to figure out what to expect as shown by Stefan Scherfke in his talk SimPy – An Introduction and a Real-World Example with Electric Vehicles.

Talks Day 2
And now for something completely different. Steven Moran taught a 10-minute "Linguistics 101" course and showed how quantitative methods can be used in linguistics from Python in his talk A Python Library for Historical-Comparative Linguistics. Python has a strong foothold in finance as Yves Hilpisch demonstrated in Python for Finance. The audience voted the talk by Simon Ratcliffe about Python and the MeerKAT Radio Telescope as best presentation. No surprise, Python is use literally everywhere in this project. "No need to vote any more. Elections were yesterday. We just need to analyze Twitter data." This could be a yellow press headline simplify the great project Laurent Luce presented in his talk Pytolab: Twitter statistics on the 2012 French presidential election. Different topic please! How about earthquakes? What damages they can cause can be calculated, of course with Python, as Anton Gritsay explained in nhlib - a library for seismic hazard analysis. Large scale physics anyone? The talk by Mark Basham Development of the Opt-ID tool within the SDA/DAWN IDE showed impressively how Python can help to optimize the construction of a Synchrotron and just as side effect outperform existing FORTRAN code by several orders of magnitude because things are so obvious with Python. I gave the last talk showing off new Cython features to use OpenMP for GIL-less, real parallel threads in Python No GIL - Parallel Python Programming with Cython and OpenMP.

Posters
Posters are a great way to present your topic and you have time to address the questions of your audience in a one-to-one fashion. Scientists are used to posters at conferences, hence there were more posters than talks. Each poster presenter had exactly one minute for a teaser to say what his or her poster is about in front of the entire audience. This helps a lot to understand what to expect, especially at a conference with such diverse spectrum of participants as this one. There were too many poster to start mentioning them here. The best, as chosen by the audience, was the one about memory_profiler by Fabian Pedregosa. Have a look at the List of Abstracts to read about all the others. Many of them are pretty deep in their fields. For example, I do have problems to visualize eight-dimensional space. I don't know about you. ;)

Lightning Talks 
While posters are a typical science thing, lightning talks come from the programming community. I don't know any science conference with lightning talks, well, I mean except computer science. ;) But since both communities intersect here, we got them both. This three minutes of fame cover very different topics like introducing a project, asking people to help develop MayaVi, informing about startup opportunities in Chili, or advertising for PyConDE.

People
Many consider the hall way talk the most important of them all. You can always watch talk videos and read slides or use all the means of remote communication, but meeting people in real life is still very different. I met people who I know since the first EuroSciPy in 2008 or any other of the previous events as there were many familiar faces. I also met plenty of new people that were there for the first time.

There is no need to evangelize people about Python. They know all advantages of Python already. But it is good to see people from such diverse backgrounds using Python for things you may only have a remote idea about or none whatsoever. Still, the discussions are interesting and Python is a ground common enough to communicate and learn something new. Meeting folks from communities you typically have little contact with helps to keep your mind open.

Brussels
Brussels is a nice city to conference. The atmosphere is pleasant, buildings look good, plenty of restaurants with good seafood, beer with about twice the alcohol content than what I consider average. That is how I experienced Brussels and what I heard form the other conference goers sounds pretty similar.

We had several social events exploring Brussels cuisine and beer culture. The even more relaxed atmosphere built the ground for interesting conversations and marked the beginning of the one or other the collaboration.

Thanks
Organizing a conference is quite a task. I know this from my own experience being the main organizer of the first two EuroSciPy conferences in Leipzig and chairing PyConDE the second time this year. Therefore, my thanks to the organizers is even more sincere. They did a great job. Especially the two man on the ground in Brussels, Nicolas Pettiaux and Pierre de Buyl, putting in many hours of work. And they will kindly host the EuroSciPy again in 2013.

1 There were two more days with sprints. But I did not participate and cannot write about them.

3 comments: