What is the optimal lines of code per method? How long should you make a function?
For years I've lent on a rule of thumb, every method must be viewable on a single screen. I assume I read it somewhere and it seemed logical to me and I've done it ever since.
Recently though I thought I'd go find where I got this rule from, so I looked up my old copy of Code Complete. In chapter 5.5 How Long Can a Routine Be? it points out a few studies that suggest that my rule of thumb is wrong! Up to 200 lines of code was acceptable per routine. Something made me a little suspicious of this partly due to the fact they were using the word "routine" instead of "method". This leads me to think they were thinking in terms of C routines and also that some of the studies in the book may be out of date. Earlier on the book was even arguing for using routines, well duh!!! Which I find just amazing, still methods must have come from somewhere.
So I searched around the net and found an article on encapsulation and module size. It claims a module is most effective when between 200-400 functional Lines of Code. Which leads to 400-800 lines of code with comments white space etc. Which is more within my thinking.
Still this didn't solve my problem, where did I hear this rule? Well the answers is I can't find it. But I have a guess as to where it comes from. Earlier in Code Complete it mentions "Cohesion" breaking it up into 4 good types, Functional, Sequential, Communicational and Temporal.
- Functional cohesion is when a method performs only one operation e.g. sin, GetLocation, CalculateLoanPayment. I have always striven for this as it makes my code easier for me to understand.
- Sequential cohesion is when a group of things have to be done in order. I find this common enough but each step in the sequence I will break up into functionally cohesive method calls.
- Communicational cohesion is when the method operates on the same data but may do two things on it. I generally don't like methods like this.
- Temporal Cohesion is things that occur at the same time. ie OnLoad
I think that in my efforts to create functionally cohesive methods I have adopted a practice that the studies of the mid 80's did not support. To make a function cohesive I have to see it all at the same time and confirm it really only does one thing. Even now if i run into a big method i bust it down to understand it.
The later study I found, while not directly supporting small methods, pretty much makes it impossible to create long methods while having shortish classes. In the end you would have 2-6 methods per class, each ranging from 65 to 200 lines of code.
Code Complete hints that functional cohesion is the most important of the cohesions and it seems pretty obvious to me. One method to do one thing, how easy is that?
What this leads me to believe is that a new generation of developers have been trained in OO methods and think differently when coding. And that the studies done in the 80's are probably out of date in regards OO coding practices. For one thing there is no mention of Cyclomatic Complexity or Halstead Complexity measures.
I have a few Electrical Engineering friends who still seem to get taught to code in sequence, rather than in encapsulated object terms. Is this better or worse? I'm not sure. We know that sequential code can become spaghetti but this has as much to do with bad cohesion and coupling, rather than method length.
Is OO better, I suspect so in that it encourages encapsulation and functional cohesion. Note how i said encourages, because some coders just never grok these two terms. Which is probably bad, but maybe not if some of the old Code Complete studies still apply. I also like to create logically cohesive objects, which is what OO (Object Orientation) is all about. BigMother is not a good name for an object (or a database table, which I once ran into).
So in the end i think i chose my rule of thumb based on functional cohesion and cyclomatic complexity without having a study to base it on.