mercredi 4 janvier 2017

Pair programming

I appreciate pair programming more and more. It was a controversial practice for me. I used to find uncomfortable to share my computer, my tools, my personal universe with someone. Yet this practice is really powerful. Benefits:

  • It brings more focus. Having a pairing partner, I'm less prone to procrastinate. Even better, I feel like we are less likely to be disturbed by other people.
  • You catch more typos before running the compiler and / or the tests, hence save sensible amount of time
  • It's the only efficient way to share practices like TDD, shortcuts and habit that both pairing partners use to be productive
  • You practice code reviews on the fly, which is less boring that doing them later (you know later can mean never, don't you?)

It's not natural to feel comfortable with pair programming. Here are some advice I can think of :

  • coding dojo is a "safe" way to experience pair programming. However, they attract XP enthusiasts that may already know or use pair programming. You cannot force people to do coding dojo.
  • use a timer to switch and do breaks. Consider pomodoro technique. I use tomighty.
  • if you are more comfortable with pair programming than your pairing partner, pay attention to include her or him in the process. Force yourself to ask questions, ask how does he or she feel. Beware of silence. It's pair programming, not rubber ducking.

I know there is a more radical technique in the field today, called mob programming. I'm sure it has benefits in some thorny cases. As I am not a practitioner myself, I don't think it is something you do systematically. And I prefer learning to walk well before running.

What are you telling to your boss to do pair programming? Nothing. Just do it, and measure improvement. You may not be 2 times faster, but you will be faster and improve the overall quality of the code.

lundi 2 janvier 2017

Code reviews

I have to admit that I made few code reviews in my career. This is bad. Some that I made and some that other dev made on my code focused heavily mainly on the style: comments, ordering of methods, etc. This is worse. You see, lots of bugs cannot be found using unit tests and even performance tests. Examples:

  • hard-coded environnement specific values, like server names.
  • bad usage of external resources, like opening database connexion in a loop.
  • shared connections being closed by users, causing invalidation of open cursors (OK, you can catch thiw one with performance testing).
  • local directory usage on distributed systems. …

I faced all of them recently. They appeared in QA when we were lucky. Some of them have been found by the client in production. All of them could have been caught by a correct code review.

Believe me, I will intensify code reviews. Maybe you should too for 2017.

lundi 12 décembre 2016

Corporate hackathon

Some month ago I participated to a hackathon organized by my company with a colleague. The goal was to deploy a service on a Raspberry Pi 3 that could ingest 1 million messages and provide a synthesis of posted data in JSON format.

It was an occasion to get out of routine work and test new technologies. Here are the solutions we thought of :

  • Python + Redis
  • Node JS + Redis
  • Elixir

I discarded Python pretty soon, because it was not performant enough. Node looked promising and we found cool stuff to generate the synthesis using Redis. We also tried cluster server, which gave us tho possibility to run 4 nodes processes. Results where OK on our laptops.

We were waiting for official performance test suite from the organizers 5 using [gatling.io][gat], hence implemented in Scala). It's been buggy for a lot of time and could not rely on it to validate our solution. We were also waiting to be provided with a RPI from the organizers. The final approaching, I decided to buy one.

The tests on the RPI were surprising in the bad way, I did not expect this difference of speed. From 5000 queries per second, we were accepting about 3 times less.

We had another surprises when delivered with the final version of the official test suite : our synthesis was wrong!

I spent 2 nights before the final to deploy a version based on node and postgresql to fix this, as I was sure to be able to compute a synthesis in one SQL query. The request rate dropped bellow 1000 though, which meant we would be able to ingest 1000000 messages in more than 15 minutes. It's pretty long.

Other default, the JSON generated by node was not in expected format regarding numbers. Actually, I put an ugly hack at 3 AM to be able to serialize a Double with 2 digits of precision (you'll see it in our code, on node_pg branch).

The winner of the challenge based his implementation on Java using Undertow as a web server and MapDB for storage. The solution ingested the million of messages in less than 3 minutes and 10 million messages in less than 30 minutes when our solution failed miserably after 3 hours of processing! Actually, he has no problem for synthesis generation as the serialisation methods was exactly corresponding with test suite deserialization method.

So we did not as well as we expected, but we learned a lot:

  • Java is not a thing from the past, and concurrency based on multi threading was appropriate here . It's been a great reminder that the relevance of a solution strongly depend on the context.
  • We should have deployed on RPI sooner to adapt ourselves
  • Postgresql is a beast hard to tune if you don't have time to. I don't think its appropriate on a RPI. It seems hard to limit IO.

By the way, I'm happy with the results since we did not put so much effort in it and we still won a price (a Fitbit blaze that I'm currently trying to sell). Also, I did not have Internet at home at the time, so I did all my tuning offline! It was a great experience and we'll do better next time!

mercredi 23 novembre 2016

My talk at Pyconfr 2016

This year I got the chance to speak at the Pyconfr! It was motivating for me, as it is a national event and, as you may know, I really like Python.

I was a bit nervous, of course, but as trained myself before the talk, I gained confidence. It's a chance, because I was not so motivated to rehearse. But if I did not, my talk would have been lame. I mean, more that it actually was!

For the record, I submitted 2 talks at the CFP this year. The first one was about Hypothesis and property based testing. I really wished it would be accepted, as it would have motivated me to explore this library.

The one that has been accepted is Help, we to not have any Python project in my company, not so easy to present. You can watch the video here!

And, yes, it's in French.

mardi 1 novembre 2016

Fed up with properties files

Once again, in the source code of our applications, I find lots of constants defined in a properties file in Java. Why that? In these files, one can only define strings. Reference to property entries are also strings. They're not checked at build time. Hence, they cannot be interpreted by an IDE. You cannot jump easily to your property value. It's a shame, as this is why we use heavy IDEs! Why are you carrying out such a sabotage???

In my case, a Java file references constants that are declared in another Java file where they reference… Properties! That's indirection for the sake of indirection.

In lots of situations, properties are useful though. You sometimes want to be able to configure your application without building it again. You want to be able to setup parameters specific to your running environment. You want to be able to change some parameters and see them applied without restarting the application, such as logger configuration.

That represents very few cases though. This is not a good idea to store to every constant that are used by your application in those files. In most cases, changing a constant value should be thoroughly tested. It is not something to be done by everyone, without going into build phase.

So please, stop using properties by default. Create one only when you cannot avoid it.

mardi 25 octobre 2016

Namedtuples in Python

Lately, I integrated namedtuples in my Python programming vocabulary. They allow you to create data structure classes in one line.

Start by importing them from collections module:

from collections import namedtuple

Define a new class as a namedtuple:

Person = namedtuple("Person", ("firstname", "lastname"))

You can know create new instances of Person, as you would do with any other class:

john = Person("John", "Doe")

And if you print it:

In [8]: print(john)
Person(firstname='John', lastname='Doe')

(yes, I use IPython, don't you?)

Now let's see what namedtuples give you.

Unpacking:

In [9]: f,l = john

In [10]: f
Out[10]: 'John'

In [11]: l
Out[11]: 'Doe'

Field access by name:

In [12]: john.firstname
Out[12]: 'John'

In [13]: john.lastname
Out[13]: 'Doe'

You also have access by index:

In [27]: john[0]
Out[27]: 'John'

In [28]: john[1]
Out[28]: 'Doe'

(OK, I tried some stuff during the redaction of the article)

And that means you can iterate on them, great!

In [31]: for value in john:
   ....:     print(value)
   ....:
John
Doe

You can retrieve the indexes of defined values (like in tuples):

In [29]: john.index('Doe')
Out[29]: 1

And count the occurrences of the values for free (also like in tuples, which is useless in my example)

In [30]: john.count('John')
Out[30]: 1

There's more. Contrary to classes, equality is defined for you for free:

In [32]: john2 = Person('John', 'Doe')

In [33]: john == john2
Out[33]: True

And last but not least, like standard tuples, namedtuples are immutable

In [34]: john.firstname = "Billy"
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-a7d9f29302d8> in <module>()
----> 1 john.firstname = "Billy"

AttributeError: can't set attribute

This last one is awesome.

So what is cool with all that? The great strength of tuples is that they are immutable, unlike lists. Thats why, when you have an array of values that is not subject to change, you should consider to create it as a tuple by default. Tuples are memory efficient and give you the insurance that nothing will alter them in your application. Besides, the syntax to create them is a bit shorter:

In [39]: t = 1, 2, 3, 4, 5 # you don't even need the parens!

In [40]: t
Out[40]: (1, 2, 3, 4, 5)

Named tuples extends this ability to any data structure you could create, giving you access to fields by name for readability.

The only drawback is that you cannot define methods or properties on them, as you could do in immutable data structures in other languages. Yet it is still a nice feature of Python.

mardi 23 août 2016

Playing with Tkinter

I committed to produce a GUI for an utility at work. My idea was to use Tkinter module in Python. It was a great pretext to use it for the first time!

Tkinter is a GUI toolkit provided with Python’s standard distribution. It’s great since it avoid the burden of installing an external dependency on target systems. I’m impressed because the development is really simple. I’ve faced far less difficulties that I had with wxWindow in the past (I’m also more experienced though).

Before coding I believed that the toolkit would produce bad looking UI. Actually that’s the case, unless you use themed widgets from ttk submodule.

I struggle to find decent documentation as there is too little documentation on Python’s website. I found something relevant here (doc also available in PDF). Stack Overflow takes care of the rest.

Epilogue: eventually the piece of software will be coded in vb.Net and integrated in a larger app. It’s a shame! By the way I had fun working with Tkinter.