Today I gave a talk about Non-negative matrix factorization, an algorithm used e.g. in dataming to find similarities
in a large number of documents. In this course (Dataming and pattern recognition) we use Python to implement
some nice exercises (spam filters, document clustering, face recognition, etc.) and as a preparation for the talk I
implemented some stuff in Python.
There were several aspects of Python I was looking forward to:
- Program structure defined by indention
- Operation or function syntax on variables
- Speed (well, for an interpreted language…)
The first hours were quite funny (our professor gave a brief introduction). The documentation was a bit troublesome but
I guess I just didn't found the right one. The docs on the Python website are nice if you have enough time to actually
read it but I really miss a reference like the Ruby core documentation. There you have everything in one place and
can find almost anything with a simple browser search.
Program structure defined by indention
To use indentions to define the program structure also was nice at first. I really like the "executable pseudo code" thing
because I usually use indentions in my notes to structure algorithms. However when writing larger functions or classes I
had a hard time "scanning" the source code. The start of a specific code structure can be seen immediately but the end?
Maybe I'm a bit to much used to languages like C, Java or Ruby which have very clear end markers, but I spend much
more time in Python code actually searching the end of code block (function, method, …) than in other languages. It
feels like my eyes are hanging somewhere in between the lines, not really sure where to go next. Again, it's probably just
a matter of getting used to. Maybe a light background color for whitespace in gedit will help.
Operation or function syntax on variables
When doing some "theoretical" stuff with algorithms we often defined abstract data types. These are basically a list
of types involved, the operations that can be used on the data type and the behavior of the operations (usually expressed
in form of axioms). The operations of an array data type might look like this:
insert(Array, Number, Value) → Array
get(Array, Number) → Value
slice(Array, Number, Number) → Array
delete(Array, Number) → Array
len(Array) → Number
join(Array, String) → String
…
In the pseudo code of algorithms I use this "operation" syntax because I usually think first about what to do and second
about the data that gets manipulated.
a = ["hello", "world"]
len(a) → 2
slice(a, 0, 1) → ["hello"]
join(a, ", ") → "hello, world"
I was looking forward to Python supporting this style of programming. However it looks like this operation or function style
of programming is currently replaced by object orientation. While this is not a bad idea the results look somewhat strange.
For example joining an array of words:
a = ["hello", "world"]
", ".join(a)
Why is the string ", "
responsible for joining the array a?!
The second point is that join
is a method. Or to say exactly, that you first have to think about the data that gets
manipulates (here the ", "
, then a) and somewhere in between about what is actually done (join
). To stay
in a world of operations you could write
join(a, ", ")
where the interpreter could look if the first parameter a defines a join
method and calls it. This way you could
stay in the mental model of operations and it would also be consistent with the way you define methods in Python with the
first parameter self.
On the downside you would lose the ability to do neat method chaining. However in many languages with the ability it isn't
used anyway (e.g. Java). The syntax for creating a new instance of a class would also look somewhat strange:
new(Post)
given that new
is an operation of the Post
class (like in Ruby).
Does Python really need another unremarkable form of object orientation (OO with a dot)? Why not spice up OO a bit to amplify
some unique ideas and mental models behind Python?
Speed
In this case Python really showed of. To be more exact, numpy.matrix
. I did some rather extensive matrix multiplications with
relatively large matrices (e.g. 2000×500) and the performance was quite impressive. It calculated the results in seconds where
the Ruby matrix class took minutes to do the work (and is not really made for this kind ob job).
Anyway, these are just my thoughts about Python after playing with it for some days. It just feels like there are two directions
within the current Python language (operations and OO) and this looks a bit inconsistent from my point of view. But again, maybe
I just don't know the right Python features at the moment.