Coding Style Guidelines
OML is written in C. Currently the code exhibits a mix of different styles, but we'd like to move it towards a clean, simple C style.
Whitespace
Whitespace at the end of lines is annoying. Git provides a default pre-commit hook that prevents a commit that violates some sensible whitespace rules. Please enable it by doing this immediately after cloning the repository:
1$ git clone git://git.mytestbed.net/oml.git 2$ cd oml 3$ cp .git/hooks/pre-commit.sample .git/hooks/pre-commit 4$ chmod 755 .git/hooks/pre-commit
Tabs-vs-spaces is a religious war, which can only be solved by a deliberate dictatorial decision. In this case, I choose to base our choices on jwz. There should be no tab character in source files, and shift width is 2. The link has good advice for emacs and vim users. The source code already has emacs local variables to set the correct style (in particular, indent-tabs-mode: nil).
Identation and Braces
The basic indentation style should be K&R style. For functions, opening brace in column 0; for control structures, opening brace on the first line of the statement. If braces aren't required, use them anyway (Sonar insists on this). Descriptive comments should be Doxygen -formatted, as a block before the function implementation by default, or the prototype in the headers for user-visible or generic API.
1/** Doxygen brief comment. 2 * 3 * Longer Summary 4 * \param a first parameter 5 * \param b second parameter 6 * \return an integer 7 * 8 * \see doxygen(1) 9 */ 10int 11foo (int a, char *b) 12{ 13 char *p = b; 14 if (a < 0 || b == NULL) { 15 return -1; 16 } 17 18 while (*p && (p - b) < a) { 19 *p = toupper (*p); 20 if (*p == '\t') { 21 *p = ','; 22 } 23 p++; 24 } 25 26 return 0; 27}
Switch statements should be indented with the subordinate "case" labels lined up with the "switch" keyword:
1 switch(v->type) { 2 case OML_LONG_VALUE: v->value.longValue = 0; break; 3 case OML_INT32_VALUE: v->value.int32Value = 0; break; 4 case OML_UINT32_VALUE: v->value.uint32Value = 0; break; 5 case OML_INT64_VALUE: v->value.int64Value = 0LL; break; 6 case OML_UINT64_VALUE: v->value.uint64Value = 0ULL; break; 7 case OML_DOUBLE_VALUE: v->value.doubleValue = 0; break; 8 case OML_STRING_VALUE: { 9 if (v->value.stringValue.is_const) { 10 v->value.stringValue.ptr = NULL; 11 } else { 12 v->value.stringValue.size = 0; 13 if (v->value.stringValue.length > 0) { 14 *v->value.stringValue.ptr = '\0'; 15 } 16 } 17 break; 18 } 19 default: 20 logerror("Copy for type '%d' not implemented'\n", v->type); 21 return -1; 22 }
Do-while loops should look like this:
1 do { 2 i *= 2; 3 j++; 4 } while (i < n);
If-elseif-else constructs should look like this:
1 if (x == y) { 2 .. 3 } else if (x > y) { 4 ... 5 } else { 6 .... 7 }
Functions
Function prototypes should have the return type on a first line, and the function name and parameters signature on the following one:
1int 2oml_value_from_typed_s (OmlValue *value, const char *type_s, const char *value_s) 3{ 4... 5}
This makes source files easier to grep. Prototype that is a declaration (not part of the function definition) should however be all on one line. int oml_value_from_typed_s (OmlValue *value, const char *type_s, const char *value_s);
This makes header files easier to read. Note that OML is written in C, not C++: declarations of pointer variables attach the asterisk to the variable name, not the type. However, if your function returns a pointer to something, in the function definition it is permissible to skip the space:
1char* 2oml_type_to_s(OmlValueT type) 3{ 4... 5}
But note:
char *oml_type_to_s(OmlValueT type);
is proper for the declaration prototype in the header file. Note that the following is an abomination, and whilst it does exist in some places in the OML source code currently, they will be subject to ruthless repression when found:
1int 2omlc_init( 3 const char* appName, 4 int* argcPtr, 5 const char** argv, 6 o_log_fn custom_oml_log 7) { 8... 9}
Identifiers
OML is written in C. As a general rule, identifiers should be short and concise. Where this is not possible, they should be as short as possible whilst still being descriptive. If they consist of multiple words, then the words should be separated by underscores. OML is written in C, not Java: there should be no capital letters -- mixed case identifiers are an abomination.
The one exception to this is for typedef names. Because there is only one exception, they stand out, and this is a good thing (tm). For instance:
1static int 2out(OmlWriter *writer, OmlValue *values, int value_count) 3{ 4 int i; 5 OmlValue *v = values; 6 OmlFileWriter *self = (OmlFileWriter*)writer; 7 FILE *f = self->f; 8 if (f == NULL) { 9 return 1; 10 } 11 12 for (i = 0; i < value_count; i++, v++) { 13 switch (v->type) { 14 case OML_LONG_VALUE: 15 fprintf(f, "\t%" PRId32, oml_value_clamp_long (v->value.longValue)); 16 break; 17 case OML_INT32_VALUE: fprintf(f, "\t%" PRId32, v->value.int32Value); break; 18 case OML_UINT32_VALUE: fprintf(f, "\t%" PRIu32, v->value.uint32Value); break; 19 case OML_INT64_VALUE: fprintf(f, "\t%" PRId64, v->value.int64Value); break; 20 case OML_UINT64_VALUE: fprintf(f, "\t%" PRIu64, v->value.uint64Value); break; 21 case OML_DOUBLE_VALUE: fprintf(f, "\t%f", v->value.doubleValue); break; 22 case OML_STRING_VALUE: fprintf(f, "\t%s", v->value.stringValue.ptr); break; 23 default: 24 logerror ("Unsupported value type '%d'\n", v->type); 25 return 0; 26 } 27 } 28 return 1; 29}
This example is instructive. Above I have claimed that mixed case is not allowed except for in typedef names, but the OmlValueU union has members that have mixed-case names (e.g. v->value.int32Value). Doesn't that break the rules? Yes, in fact it does. However, these rules postdate the OML2 API, and we have to preserve backwards compatibility within the 2.x series, so for the moment those names have to stay.
Although OML doesn't follow the Linux kernel coding style, that document does have much sage advice, such as this from Chapter 4:
C is a Spartan language, and so should your naming be. Unlike Modula-2 and Pascal programmers, C programmers do not use cute names like ThisVariableIsATemporaryCounter. A C programmer would call that variable "tmp", which is much easier to write, and not the least more difficult to understand. GLOBAL variables (to be used only if you really need them) need to have descriptive names, as do global functions. If you have a function that counts the number of active users, you should call that "count_active_users()" or similar, you should not call it "cntusr()". Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs. LOCAL variable names should be short, and to the point. If you have some random integer loop counter, it should probably be called "i". Calling it "loop_counter" is non-productive, if there is no chance of it being mis-understood. Similarly, "tmp" can be just about any type of variable that is used to hold a temporary value. If you are afraid to mix up your local variable names, you have another problem, which is called the function-growth-hormone-imbalance syndrome.
Transparent and Opaque Types
Traditionally OML2 has used a lot of typedefs for internal structures and types. I am starting to move away from this. The guidance in the Linux kernel coding style guide influenced my thinking (see Chapter 5).
Typedefs should be used for types that are opaque. Such types should only be manipulated via a well defined API. For instance, the MBuffer and MString types meet this criterion. In this case, the typedef is used to actively hide what the object is, and especially to hide its implementation details.
Types in the API should be opaque, but unfortunately we already put the full structure definitions in the <oml2/omlc.h> header file. This was a mistake because it causes problems if we want to modify the internal implementation details whilst maintaining binary and source backwards compatibility. Adding a member to a struct breaks binary compatibility unless it is added at the end; and even then, if it is used in an array (such as OmlValueU and OmlMPDef) it probably still breaks binary compatibility. Removing a member from a struct can break both source and binary compatibility. For the OML3 API (still only a twinkle in my eye) I am planning to hide almost all of the implementation-specific details behind opaque types (essentially void* pointers).
Internal data structures that are meant to be transparent should not be given typedefs. The Database typedef is an example in the server source code that should probably be removed in favour of referring to the underlying struct directly; we already access its internals directly in the code. The schema.c file is an example of the new policy. It is used to represent measurement tuple and database table schemas, and it is not typedef'd. That is because it is more convenient for client code to manipulate the internals directly. It is used in several ways and places, so a "well defined" API is difficult to, well, define.
Typedefs should also not be used for types that are easily represented using native C types. If a quantity could be an int on one platform and a long on another, then a typedef is appropriate, but otherwise not. The C standard defines portable typedefs for integers of pre-determined widths. If the number of bits (and signedness) used to represent a quantity are important, use the types from <stddint.h> (uint8_t, int32_t, etc.). Otherwise just use simple types. For instance, loop counters should be int, strings should be char*, etc.
Struct and union names for transparent types should be all lowercase with words separated by underscores, just like all other identifiers. For instance:
1struct schema_field 2{ 3 char *name; 4 OmlValueT type; 5}; 6 7struct schema 8{ 9 char *name; 10 struct schema_field *fields; 11 int nfields; 12 int index; 13}; 14 15struct schema *schema_from_meta (char *meta); 16...
Code Documentation
Doxygen documentation of the code is desirable. Important functions should be well documented, with both brief and longer descriptions, along with documentation of parameters and return. Short, straightforward functions can have a brief description only. C-style comments are obviously preferred, with a slash ('\') for Doxygen commands.
Other comments are useful, but overcommenting is not. Be reasonable.
The following should summarise the expectations.
1int frob; /** Frobnication counter */ 2 3/** Frobnicate */ 4void 5frobnicate(void) { 6 frob++; 7} 8 9/** Do something complex. 10 * 11 * This function fiddles with i and c, and frobnicates some more. 12 13 * \param c character string of size i 14 * \param i size of c 15 * \return the number of times we have frobnicated 16 * 17 * \see frobnicate 18 */ 19int 20something_complex(char *c, int i) { 21 /* Skip on the actual work */ 22 23 frobnicate(); 24 return frob; 25}
License boilerplate
OML is MIT licensed by Nicta. The full license comes in the COPYING file at the root of the tree. However, we want to make sure each file is clearly labelled as MIT-licensed and copyrighted by Nicta. In the interest of space, the following boilerplate is to be used at the beginning of each file:
/* * Copyright 20XX-20XX National ICT Australia Limited (NICTA) * * This software may be used and distributed solely under the terms of * the MIT license (License). You should find a copy of the License in * COPYING or at http://opensource.org/licenses/MIT. By downloading or * using this software you accept the terms and the liability disclaimer * in the License. */ Several copyright lines, listing the previous owners/contributors should be kept. == Modelines == To help support a coherent style, let's instruct our favorite, and less favorite editors, by adding the following modelines at the end of each file. {{{ /* Local Variables: mode: C tab-width: 2 indent-tabs-mode: nil End: vim: sw=2:sts=2:expandtab */ }}} == Other languages == === Python === Use a four-space indentation. === Ruby === Follow the Design_Documents OMF practice (https://github.com/bbatsov/ruby-style-guide)