AGS Engine Coding Conventions
As we're moving forward with more developers starting to get into the engine code, we need to define a well structured set of conventions which everyone can follow and conform to. This will promote consistent programming from anyone contributing to the source. Anyone is free to voice their opinion about these conventions, and it will be modified according to popular trend. As a point of reference, the Google C++ Style Guide as well as ScummVM's Code Formatting Conventions and Coding Conventions have been used.
AGS Engine Coding Conventions Draft 1.0 (PROPOSED)
Following are the AGS Engine Coding Conventions. These definitions are not set in place as absolute rule, but as guidelines that should be followed to promote consistency and uniformity throughout the source code of the AGS run-time engine. Feel free to address any of these points if you think they should be reviewed or reconsidered. These conventions will constantly be under review to ensure that they meet the current needs of those who are actively contributing to the AGS engine source code.
Formatting
The following conventions related to formatting are intended to promote code readability. Anyone contributing to the source code of the AGS engine would be asked to conform to these conventions to ensure that anyone reading it can follow your meaning. These conventions are not strictly enforced rules, but please use discretion if straying away from them.
Spaces vs. Tabs
Tab breaks should not appear in the source files, and should be converted to space characters. Each level of indentation should increase by 4 spaces. You should set the preferences for indentation appropriately in your editor. For other uses of whitespace see #Whitespace.
Comments
Generally, line comments (//) should be preferred over block level comments (/* */) as block comments cannot be nested.
Braces
"Curly braces", { and }, should occupy a line by themselves (trailing comments are acceptable also). They should be at one level of indentation lower than the code they contain. This helps as a vertical guide (in addition to the indentation) in finding where a particular block of code starts or ends.
{ // opening brace // this code is inside the braces } // closing brace
Curly braces may be omitted after conditional or loop statements if there is only one line of code in the following block. In case of combined if/else statement only omit curvy braces if each of the blocks consist of one line.
Whitespace
There are specific cases where whitespace helps improve readability of code, beyond where it is required. Use common sense and discretion, but the following are some examples of proper whitespace usage.
Horizontal Whitespace
Horizontal whitespace should be used appropriately for indenting lines, and spacing out the code on a line. You should never end a line with trailing whitespace. Whitespace should also be used in the following ways:
- Surrounding conventional operators, class inheritance, and ternary operators:
int a = 42; a *= c + d; class Derived : public Base { }; a = (true ? b : c);
- To separate a C++ reserved keyword from an opening parenthesis:
if (true) { } while (false) { }
- After a comma:
int a, b, c; function(a, b, c);
- After a semicolon, if there is more on that line (including comments):
for (int i = 0; i < 42; ++i) { } int id; // used to store the identification number
In other cases, discretion should be used, and horizontal spacing used sparingly (except of course where required).
Vertical Whitespace
Vertical whitespace should be used sparingly to separate functions or other large blocks of code, to help maintain readability. You should not have several blank lines at the start or end of a function, or in-between statements, declarations, and so forth. Use discretion as necessary, but bear in mind that the goal is to make the code readable.
Each file should end with a blank line as some compilers require it or may warn if they do not.
Naming Conventions
The user types: classes, structs, enums and typedefs, - as well as global entities such as namespaces, global constants and global variables should be named using upper "upper" CamelCase notation.
Class methods and public data fields should also follow CamelCase notation. Private class fields should be named using lower camelCase and prefixed with an underscore ('_') to distinguish single-worded members from local variables.
Local variables and function arguments should be named in all lowercase with underscores separating parts of name when necessary (but not as a prefix).
Names should be as descriptive as possible, without being overly verbose. Functions particularly should indicate what they actually do, so a name such as "GetID()" would be preferred over simply naming the function "ID()". Bear in mind that some compilers may have issue if using names that are overly long, so use discretion to determine the best names.
Generally there aren't any specific restrictions on file names, but they should give an idea of what's in the file. C++ source files should be named with a ".cpp" extension and C++ header files with a ".h" extension. No file names (stored in the same location) should differ from each other solely by case.
Namespaces
By convention, a namespace does not increase the indentation level. The namespace itself should have leading and trailing blank lines to help make this distinction clearer. For added clarity, the closing brace of a namespace should have a trailing comment with the name of the namespace itself.
namespace Namespace { // code here, indentation level still at 0 } // Namespace
Function Declarations and Definitions
Function declarations should typically not span more than one line, unless the parameter list inhibits readability. Function declarations should include parameter names, that match the parameter names of the function definition. If a parameter is nameless, a comment should notate what the parameter is for. When defining a function, the same general idea applies, that the return type, function name, and parameter list, should typically not span multiple lines. In the event that the parameter list should need to expand to a new line, it is preferred that the first parameter of the new line be aligned to match the indentation of the first parameter of the function. If this is not reasonably feasible (for example, if the function name is very long this may be problematic), then it is preferred to have each parameter on a separate line, with 4 spaces of indentation, preceding the opening brace:
void DoThis(int param1, int param2, int param3) { } int DoThat(int param1, int param2, int param3, int param4, int param5, int param6, int param7, int param8) { } char* DoTheOtherThingThatHasALongName( // opening parenthesis should always be on the same line as function name char *c, int len, bool destroy) // closing parenthesis should be on the same line as the last parameter { }
Function Parameter Ordering
Generally, function parameters will be listed in order of input parameters, in-out parameters, and then output parameters. This will help provided added clarity to your functions. You should always document the intent and purpose of your parameters.
Switch Statements
In the case of switch statements (pun very much intended!), the labels for each case should not be indented from the switch block itself, but the lines for that case should be. Any break or return statements should be at the same level of indentation as the lines inside each case. The default case should always be provided, even if it should never be reached (if this is the case, consider using assert or simply adding a comment). Fall-through from one case to another should be avoided unless there is no other code for that case. If not allowing fall-through would require duplication of code or cause other issues, the fall-through should be specifically noted:
switch (var) { case 1: DoSomething(); break; case 2: case 3: DoSomethingElse(); break; default: // this default case can't actually happen, but is provided for clarity and to prevent compiler warnings break; }
Empty Loops
If a loop statement is left empty, it should be clear that this is the intention. In this case, the opening and closing braces may appear on the same line as the loop statement (typically with no space in-between). Alternately, you could use continue instead. This is to prevent confusion and promote code readability:
for (int i = 0; i < 42; i++) {} while (false) continue;
Pointers and References
For pointers and references, the preferred convention is to have the asterisk or ampersand beside the name. Having the asterisk or ampersand beside the type is also accepted, but may cause confusion if declaring more than one pointer or reference. Having whitespace on both sides of the asterisk or ampersand should be avoided. If attaching to the type rather than the name, please try to be consistent throughout the entire file.
Boolean Expressions
Boolean expressions should always be written in an invariant fashion. That is, whether the compiler reads the operators left-to-right or right-to-left, the result should always be the same. Usage of parenthesis is encouraged to prevent variance or confusion.
If you are writing a boolean expression that needs to span multiple lines to preserve readability, be consistent in where you split the lines at. It is preferred in this case that each line end with an operator (&& or ||).
Templates
When defining a template function or class, the template keyword, angle brackets, and template parameters should occupy their own line, at the same indentation level as the following definition. There should not be a space between the template keyword and the opening angle bracket. Generic parameters should prefer the verbiage "typename" vs "class".
template<typename T> void MyTemplateFunction() { // blah! }
C++ Features
For the sake of maximizing compatibility between different development platforms, compilers, and run-time system architectures, certain features of C++ should be limited in how they are used or should be avoided altogether. Consideration should also be given to optimizing the run-time, introducing further restrictions.
Run-Time Type Information (RTTI)
Run-Time Type Information (RTTI) allows for querying of the type of an object at run-time. Executing code should not need to rely on this. Consider using virtual methods or redesigning the class as appropriate. RTTI can also create a notable increase in binary size.
C++11 (formerly C++0x)
C++11 is now the official "standard" for C++ compilers, but it may not yet be fully supported by all compilers. The AGS engine code has primarily (exclusively?) to date been developed with Microsoft Visual Studio 2008 which has very limited to no support for most C++11 features. We should try to avoid writing code that is incompatible with C++11, but should not yet use or rely on its libraries or language extensions. It may be useful to instead include some utility classes in the mean-time (as an example, a comparable definition could be supplied to C++11's std::nullptr_t with minimal effort and code).
Boost and other Libraries
Currently no features of Boost (or other similar libraries) should be used. We will need to review this at a later time, and may consider writing our own utility classes instead. If library code is used that is not consistent with our own coding conventions, a utility/wrapper class should be used.
Preprocessor Macros
Wherever possible, preprocessor macros should be avoided. Instead consider using an inline function, const variable/reference, or enumeration. Outside of utility classes where they may be required for portability or other reasons, code should typically never be conditionally compiled. Instead of conditionally compiling something in one of the main source files, first consider if a utility class could be implemented to abstract the calling code (this particularly will help prevent huge/jarring #ifdef blocks in the main line of code).
Variable-Length Arrays and alloca()
Variable-length arrays and alloca are not part of the C++ Standard and can lead to bugs which can be difficult to find. A utility class or library should be implemented or included to provide similar functionality.
NULL and Literal 0
C++ does not provide a strongly typed null pointer and will implicitly cast the literal 0 to any pointer type (C++11 defines std::nullptr_t). Most compilers will define NULL as the literal 0. This can lead to confusion and ambiguity if the literal 0 is passed to an overloaded function that takes integral and pointer types as the first parameter. Instead, a strongly typed null pointer type should defined, implicitly convertible to any pointer type or pointer-to-member type (but not implicitly convertible to integral types).
To further prevent ambiguity, the literal 0 should only be used with integral types, 0.0 should be used for real types, and the char-literal '\0' where appropriate.
Function Overloading
Function overloading is allowed, but some discretion should be used. Generally, you should avoid overloading functions only by parameter type as it can be confusing as to which overload is being called. If the number of parameters has also changed this isn't an issue, but it can improve the clarity of the code to use separately named functions instead if only the parameter type is changing.
Default Arguments
Default function parameters, while useful, can also lead to confusing or undesired behavior if the defaults are not well understood. Generally we should try to avoid using them for this reason. If you do include default arguments, it should be clearly documented (and obvious to the caller) what the default behavior will be.
Use of const
Whenever it makes reasonable sense to do so, const should be applied if it does in fact apply. If a function will not modify one of its arguments, the argument should be const. Class methods should be const provided that they do not modify any of the class members, do not call any non-const class methods, and do not return a non-const pointer or reference to a data member. In the case of pointers it is generally not necessary to declare the pointer itself as const, though doing so is also acceptable.
Pointer and Reference Parameters
In a function call, a reference parameter will look homogeneous to a pass-by-value parameter. For this reason you should generally avoid non-const reference parameters, preferring a pointer instead. Of course that may imply that NULL may be an acceptable value even when it is not. Use discretion in these cases, and document the behavior of your parameters appropriately. Input-only parameters should always be const, and should rarely be pointers (one exception being if NULL is an acceptable value).
Exceptions
Exceptions should not be used. Some compilers and run-time systems may not have proper support for exception handling. Exceptions create a notable overhead in the size of the binary. Exceptions can also cause the program to terminate with no useful error message if they are not handled properly, and could lead to objects not being properly destructed. Utility classes for handling errors, appropriate function return types and values, error codes, and assertions should instead be used to keep the code running in a clean and consistent state. Any included libraries that use exceptions should have wrapper classes provided that will handle any exceptions that may be thrown.
In the case of an error, the preferred convention is to terminate with an appropriate error code as soon as possible. You should not have large blocks of code contained within if-statements just to by-pass the error conditions; instead check for the error and abort the function before executing the rest of the code.
Using sizeof
When using sizeof, prefer checking against a variable of a given type rather than the type itself. This will help prevent issues of the type of the variable is later changed. Checking against a type isn't prohibited, but you should do so with caution to prevent the code falling out of sync if the type being used is later changed.
Integer Types
C++ has several built-in integer types, but the size of them is not well defined. Generally you should use only "int", but if you need an integer of a specific size you should use the appropriate type from <stdint.h>, such as int64_t. You should not use the unsigned keyword or unsigned types to indicate that a value will never be negative, as this can lead to bugs or confusing code (consider a loop checking the iterator as being greater or equal to 0). Instead use assertions to ensure a value is not negative. It is acceptable to use unsigned types to represent a bit pattern or if you need twos-complement overflow. Where appropriate other types like size_t and ptrdiff_t can also be used.
Preincrement and Predecrement
When a variable is incremented or decremented, the postfix operators (i++ and i--) will require a copy of the value to be made, while the prefix operators (++i and --i) do not. For non-scalar types (iterators, etc.) this could incur significant overhead. For scalar (non-object) types, there's no reason to prefer one over the other, but for the sake of conformity. We should avoid using the postfix operators for increment and decrement operations, preferring the prefix operators instead.
Casting
For maximum clarity and portability, C-style casts should not be used. Of the C++ casts, only static_cast and const_cast should be used in production code. dynamic_cast relies on RTTI which is not allowed. reinterpret_cast is unsafe, and generally not portable due to platform-specific interpretation. const_cast should only be used by utility classes for working with library code that expects non-const parameters but does not actually change the value itself.
Streams
As with any feature, there are benefits and drawbacks to using streams. Using streams means that you do not have to specify the type of object, but at the same time introduces a level of ambiguity which could be misleading or confusing. In keeping with making the code as clearly defined as possible, streams should be avoided.
Inline Functions
Inlining a function can save the overhead of a function call, but should not do so at the expense of binary size. Generally if a function is longer than about 10 lines, includes loop statements, or is recursive, it is not a good candidate for inlining. You should also enable compiler warnings (where applicable) if a function is marked as inline but cannot be inlined.
Scoping
With regards to scoping, things should typically be declared in the narrowest scope possible without impairing usage or accessibility.
Namespaces and using Directives
Namespaces help prevent collisions and ambiguity. A using directive will remove this layer, so use some discretion when using one. No using directive should be included in a header file, and you shouldn't use it to avoid typing the namespace name only a few times. There's no hard set rule on this, but keep in mind that for the extent of the file in question, a using directive will pollute the global namespace.
Nested Classes
Nested classes are allowed, but keep in mind that they are inaccessible without the full definition of the outer class. This could be impacting if you're trying to forward declare a nested class. Consider making the classes separate, or using namespaces, unless the nested class is actually part of the structure of the outer class.
Global and Static Variables
Global or static instances of a class type are strictly forbidden due to indeterminate order of construction and destruction. This can lead to bugs which can be difficult to track down, particularly between different systems or architectures. Also, static instances within a function are not accessible outside of the function, which means their values cannot be easily reset (it would require each function with a static instance to have a means of calling it explicitly for the purpose of resetting the instance; also these functions would need to be cached to ensure they could all be reset as needed, further complicating them). Objects of a POD type (chars, ints, floats, pointers, and arrays/structs of POD) are allowed to have static storage duration, but of course should not exist in the global namespace.
Functions
Except where required (e.g., "main") global functions in the global namespace should be avoided. Use static functions where appropriate, but don't create a new class just to group static functions together (use a namespace instead).
Local Variables
Local variables (within a function) should generally be initialized as part of the definition, and should be defined as close to their first usage as possible (within reason). In most cases, things should be defined within the narrowest available scope, but an exception may be made in the case of loops. If a variable is defined inside of a loop then it will be created and destroyed with every iteration of the loop. Particularly when working with class types this may impact performance, so it is acceptable to define the variables outside of the loop using them when it makes sense to do so.
Header Files
Every header file should have predefined macros to prevent multiple inclusion. The preferred naming convention for these macros is two underscores, optionally the path followed by another underscore, the file name, underscore, and a trailing capital letter 'H'. By convention, the #endif should also have a trailing comment indicating the name of the macro.
#ifndef __MYHEADER_H #define __MYHEADER_H // your code here #endif // __MYHEADER_H
Classes
Classes are a great tool for many reasons, but as with any tool, there are things to consider when using them. This section will cover how classes should (and shouldn't) be used.
Classes vs. structs
In C++ a "class" and a "struct" are functionally equivalent in most cases, with the prime exception being the default access specification (default is private for a class, public for a struct). To promote consistency, a class should be used for any non-POD type, while a struct should always be a POD type. This will make it easier to distinguish between POD and non-POD types. Throughout this document it is acceptable to substitute "struct" in place of the term "class" where applicable.
Access Specifications
Access specifiers (public, private, protected) should never be omitted (even for structs). For classes, data members should typically be either private or protected, with public accessors (read as: getters and setters) as needed. Accessors are not required for any const values.
Interfaces
C++ does not have a mechanism for defining an interface in the same sense as languages like C#, though it can be useful to define classes with similar usage.
A class can be defined as an interface (for our purposes) provided that it does not inherit from any non-interface classes, does not have any non-static data members, has a virtual destructor (but not a pure virtual destructor), and has only functions that are both public and pure virtual specifier ("= 0"). If a constructor is provided (optional) it must not have any parameters, and must be protected.
By convention, any interface classes should be named with the suffix "Interface" (e.g., "ClassInterface" or "FooInterface").
Inheritance
Inheritance should be limited to class interfaces and cases where the derived class "is a type of" the base class. If you can't logically define the derived class as a type of the base class, then you shouldn't be using inheritance. There are also other things to consider with regard to inheritance, such as the fragmentation of the class definition (some things defined in the derived class, some in the base class, etc.). Inheritance also prevents a derived class from changing the behavior or implementation of a base class function that isn't defined as virtual.
Generally you shouldn't inherit from multiple base classes except interfaces.
All inheritance should be public. If you can't use public inheritance, then the derived class probably doesn't match the "is a type of" rule for the base class. You can use private data members as an alternative in this case.
Virtual Methods
Keep in mind that if your class defines any virtual methods, the destructor should also be virtual. When overriding virtual methods, explicitly declare the method as virtual in the derived class. This will improve readability and help prevent class fragmentation.
Operator Overloading
Operator overloading is allowed, but that doesn't mean you should just overload all of them for every class. Operators should be overloaded where it makes logical sense to do so, and it is clear what is taking place. Equality and inequality operators, as an example, should typically perform memberwise comparison of two objects. Overloading these operators makes sense for POD structs, but may not always make sense for complex classes. Also keep in mind that when debugging or changing overloaded functions, it is significantly easier to look for an appropriately named function instead of looking for the overloaded operators themselves.
Doing Work in Constructors
Since constructors do not return a value and exceptions are prohibited, there's no useful way for a constructor to indicate to the caller that initialization of the object was unsuccessful. For this reason, you should generally avoid putting any non-trivial code in your constructors, and should consider supplying an Initialize function instead. In this case, the constructor should construct the object into an "invalid" state which can then be updated by the Initialize function (which can return an appropriate error code, etc.). Typically the constructor itself should do little more than copy data into the data members.
Default Constructors
If your class has data members but does not have any constructors defined, then the compiler will generate a default constructor for you. This may not always initialize your objects into a consistent or sensible state. You should always declare at least one constructor for any classes that have data members to ensure that they are initialized appropriately.
Copy Constructors
Unless you specifically need it, you can disable the copy constructor (and equivalent assignment operator) for your class, which will prevent implicitly passing by value. Generally classes should be passed by reference or pointer, but you can also consider providing a separate copy function as needed. This can make it clearer that you are specifically copying the object by value. The copy constructor and assignment operator can be disabled simply by forward declaring them but never actually providing a definition. This will raise a compiler warning if you try to invoke the function.
Explicit Constructors
Explicit constructors can be used to prevent implicit conversion if a constructor is defined with only one parameter. This implicit conversion can be confusing or misleading if not intended, so if you are defining a constructor taking only one parameter (other than copy constructors, where applicable), it should generally be defined as explicit. This will not of course change the way the constructor is actually being used, it will simply prevent it being silently invoked when not expected.
Closing Words
Finally, we have reached the end. Again, these definitions are not set in place as absolute rule, but as guidelines. Use common sense! Remember that the end goal here is to keep the code in a clean and consistent state so that we do not end up again in the same situation we find ourselves just now beginning to emerge from. This isn't meant to say anything against CJ, he worked singlehandedly on the program for over a decade, it's little wonder that inconsistencies should slip in. Still, if we can all agree to a common ground for our coding style, we can future-proof our code, regardless of who's contributing it!
Thank you for your time and consideration!
Sincerely,
The AGS Project Administrators and Engine Source Code Contributors