First encounter with Python

April 21st, 2010 python, datamining, university, programming

Today I gave a talk about Non-negative matrix factorization, an algorithm used e.g. in dataming to find similarities in a large number of documents. In this course (Dataming and pattern recognition) we use Python to implement some nice exercises (spam filters, document clustering, face recognition, etc.) and as a preparation for the talk I implemented some stuff in Python.

There were several aspects of Python I was looking forward to:

Program structure defined by indention
Operation or function syntax on variables
Speed (well, for an interpreted language…)

The first hours were quite funny (our professor gave a brief introduction). The documentation was a bit troublesome but I guess I just didn't found the right one. The docs on the Python website are nice if you have enough time to actually read it but I really miss a reference like the Ruby core documentation. There you have everything in one place and can find almost anything with a simple browser search.

Program structure defined by indention ¶

To use indentions to define the program structure also was nice at first. I really like the "executable pseudo code" thing because I usually use indentions in my notes to structure algorithms. However when writing larger functions or classes I had a hard time "scanning" the source code. The start of a specific code structure can be seen immediately but the end? Maybe I'm a bit to much used to languages like C, Java or Ruby which have very clear end markers, but I spend much more time in Python code actually searching the end of code block (function, method, …) than in other languages. It feels like my eyes are hanging somewhere in between the lines, not really sure where to go next. Again, it's probably just a matter of getting used to. Maybe a light background color for whitespace in gedit will help.

Operation or function syntax on variables ¶

When doing some "theoretical" stuff with algorithms we often defined abstract data types. These are basically a list of types involved, the operations that can be used on the data type and the behavior of the operations (usually expressed in form of axioms). The operations of an array data type might look like this:

insert(Array, Number, Value) → Array
get(Array, Number) → Value
slice(Array, Number, Number) → Array
delete(Array, Number) → Array
len(Array) → Number
join(Array, String) → String
…

In the pseudo code of algorithms I use this "operation" syntax because I usually think first about what to do and second about the data that gets manipulated.

a = ["hello", "world"]
len(a) → 2
slice(a, 0, 1) → ["hello"]
join(a, ", ") → "hello, world"

I was looking forward to Python supporting this style of programming. However it looks like this operation or function style of programming is currently replaced by object orientation. While this is not a bad idea the results look somewhat strange. For example joining an array of words:

a = ["hello", "world"]
", ".join(a)

Why is the string ", " responsible for joining the array a?!

The second point is that join is a method. Or to say exactly, that you first have to think about the data that gets manipulates (here the ", ", then a) and somewhere in between about what is actually done (join). To stay in a world of operations you could write

join(a, ", ")

where the interpreter could look if the first parameter a defines a join method and calls it. This way you could stay in the mental model of operations and it would also be consistent with the way you define methods in Python with the first parameter self.

On the downside you would lose the ability to do neat method chaining. However in many languages with the ability it isn't used anyway (e.g. Java). The syntax for creating a new instance of a class would also look somewhat strange:

new(Post)

given that new is an operation of the Post class (like in Ruby).

Does Python really need another unremarkable form of object orientation (OO with a dot)? Why not spice up OO a bit to amplify some unique ideas and mental models behind Python?

Speed ¶

In this case Python really showed of. To be more exact, numpy.matrix. I did some rather extensive matrix multiplications with relatively large matrices (e.g. 2000×500) and the performance was quite impressive. It calculated the results in seconds where the Ruby matrix class took minutes to do the work (and is not really made for this kind ob job).

Anyway, these are just my thoughts about Python after playing with it for some days. It just feels like there are two directions within the current Python language (operations and OO) and this looks a bit inconsistent from my point of view. But again, maybe I just don't know the right Python features at the moment.

#1 by Marc April 22nd, 2010 15:14

I had the exact same experience when playing with python.
I'd love to be able to like it more, but I just can't seem to wrap my brain arround the weird operation/function mixture and the "strange" documentation just kept me from diving in deeper.

Concerning speed compared to ruby:
I'm happy that I now do 1.9.x/jruby stuff only.
Using RVM (http://rvm.beginrescueend.com/), switching between rubinius, jruby and 1.9.2-head is a breeze 🙂

#2 by pstradomski April 22nd, 2010 16:25

The fact that join is defined on strings allows it to join any sequence (anything that can provide an iterator).
If it was defined on list, then all other sequences would have to provide their own join methods, or derive from common base class. Both are undesirable.

By the way: your validator for website (when posting comments) is broken.

#3 by Reid April 22nd, 2010 20:06

Fine post, just to pick some nits, the reason that the join function is defined in the string class is that you can only join a collection of strings together. If it were on the list class, it would throw an error if the elements were not strings and you wouldn't be able to join other collections of strings without somehow inheriting making modifications to call the original function.

That said, one could argue for a separate string_join(", ", items) function. Python isn't the most consistent language when it comes to methods vs. functions.

#4 by Marc April 23rd, 2010 14:49

It would also be interesting to see how narray (http://narray.rubyforge.org/) or GSL bindings (http://rb-gsl.rubyforge.org/matrix.html ) compare to the c-extended library in python

#5 by Stephan Soller April 24th, 2010 19:00

Thanks for the validation check hint, pstradomski. It's fixed now.

I now do understand why join() is defined for strings (thanks for the explanation) but I still don't like it. Joining elements is definitely behavior of a simple collection and not of the part repeatedly used to glue the pieces (the existence of a glue part doesn't make sense without a collection). What if the glue part isn't supposed to be a string or if joining a special collection doesn't require a glue part at all (e.g. joining processes maybe)?

Such cases of common behavior are usually handled by mixins (Ruby, Smalltalk, LIPS, etc, even in D) or multi-inheritance (C++). It looks like Python also features mixins so the question is: why wasn't join() (or other general behavior of simple collections) defined in a mixin? The Enumerable mixin of Ruby demonstrates the advantages of this approach quite well.

Anyway, thanks for the feedback guys and the links to Ruby narrays and the GSL bindings Marc. I'll give them a try. 🙂

Program structure defined by indention ¶

Operation or function syntax on variables ¶

Speed ¶

Comments