Technical Report: Static Analysis (Java) in OSS Project TEAMMATES
Static analysis tools have been widely used in TEAMMATES, which is helpful in maintaining code standard, coding quality or obtain even bug-free code. This report explains how static analysis tools for Java have been used currently in the project. In addition, it also explores some additional static analysis rules, which could be enforced gradually in the project.
Existing Static Analysis Rules
Currently, four static analysis tools are used in TEAMMATES for Java. All of them are integrated into Gradle
build task staticAnalysis
in CI (Continue Integration) process. Details of the four tools are as follows.
CheckStyle
CheckStyle
reports violations based on provided XML
file. It is used to enforce stylistic standard. For example, we enforce that right curly brackets ({
) should not be put in a new line.
Macker
Macker
is used to detect violations in high-level design. For example, one of the current enforced rule says that “test cases should not be dependent on each other”. Currently, there are only two rules for this analyser.
PMD
PMD
is used to find design problems such as unused imports. The rule sets of PMD
are much bigger than those in CheckStyle
. Some rules in PMD
only make sense for production code. Therefore, we have another PMD
configuration for main
code.
FindBugs
Only one rule(FindDeadLocalStores
) is enforced in Findbugs
currently. Findbugs
cannot print violations in the build log. It is almost excluded in the CI process because we cannot check which rules are violated in the console when build fails.
Links updated at commit bd97f42 at master
Additional Static Analysis Rules
Beside existing rules, static analysis in Java for TEAMMATES can be explored further. We shall discuss some additional rules in CheckStyle
, Macker
, PMD
and Findbugs
. Concrete configurations or customised checkers will be proposed.
We shall divide the discussion by content into three sections: Coding Standard, Design Principle and Bug Prevention.
Coding Standard
We use coding standard in OSS Generic project for TEAMMATES. Most of the standards have been enforced using CheckStyle
or PMD
. Several of them can be explored further.
Boolean Variable Naming Convention
According to the coding standard, “boolean variables should be prefixed with ‘is’”. There are a few alternatives, which are “has”, “can” and “should”. Besides, it requires us to avoid boolean variables that represent the negation of a thing (isNotInitialized
is better than isInitialized
).
Currently, all the static analysis tools don’t support check of this naming convention. We shall write our own check.
CheckStyle
provides us a way to write customised check. Basically, the check will check name of each boolean variable against the naming convention.
1 | // part of the code (Core Logic) |
Run the check against TEAMMATES ([bd97f42](https://github.com/TEAMMATES/teammates/commit/bd97f4210749b8a58a8285258098c2f91d492099 at master
)) and we will get violation reports for checkstyleMain
and checkstyleTest
For production code, there are 122 violations. Some violations such as participantIsGeneral
are valid. isParticipantGeneral
is more appropriate in this case. Some of the violations may be reasonable as the names may start with allow
instead of is, has, can, should
. Some boolean variables represent the negation of thing such as isNotTeammatesLog
, but there are some false positives. For example, isNotSureAllowed
and isNotificationIconShown
should not be included.
Variable Declaration Usage Distance
In the coding standard, it says that “variables should be initialised where they are declared and they should be declared in the smallest scope possible”. There exists a rule in CheckStyle
for this coding standard: VariableDeclarationUsageDistance
, which just checks the distance between declaration of variables and their first usages.
The default distance of this check is 3
. Here is the graph of number of violations VS distance
. (Based on 8f873 at master)
Individual reports:
checkstyleMain
: Max Distance 3, Max Distance 4, Max Distance 5, Max Distance 6, Max Distance 7, Max Distance 8, Max Distance 9, Max Distance 10checkstyleTest
:
Max Distance 3, Max Distance 4, Max Distance 5, Max Distance 6, Max Distance 7, Max Distance 8, Max Distance 9, Max Distance 10
There are false positives, especially in test code. For example, we may prefer this way, where we define all the variables first.
1 | Instructor ins1inC1S1 = new ... |
However, there are some valid violations. For example, in production code, it is not a good practice to initialise a variable as null
and assign value later (pointsForEachRecipientString
in this case).
Comments Indentation
According to the coding standard, comments should be indented relative to their position in the code. There is a rule in CheckStyle
: CommentsIndentation
, enforces the same amount of indentation for comments and surrounding code.
The rule is enforced in #6814.
Sample violation:
Spelling of Words
The most boring thing for reviewers is finding typos in PR (Pull Request). The reviewer may need to spend unnecessary time finding typos. Also, the submitter gets disappointed. Here is the solution. If we strictly follow the camel case naming convention and use underscore to separate words in constants. We can leave this work to computer bots.
The idea is that we extract words from variable names or method names and check whether it exists in any English dictionary.
There exists a comprehensive English dictionary, which includes around 34 thousands English words. In addition, we could define our own vocabulary such as html
, css
and tooltips
.
Customised CheckStyle
check:
1 |
|
Here are two dictionaries used in the checks
When running the check against master (at 8f8738), we get 658 violations for production code. There are some false positives. However, there are many valid cases.
In addition, we also found frequent occurrence of name1
, name2
, student1
etc. For test code, these should be acceptable. However, for production code, naming in this way would result in confusion.
CheckStyle
report for test code gets 532 violations. Note that a new file (Regex Name) should be added as we now allow naming like name1
, session1
etc in test code.
Some of them are false positive. However, there are valid cases.
Furthermore, we can also see the violation of naming convention.
In this case, instructor1ofCourse1
should be instructor1OfCourse1
(although it is arguable whether to capitalised the o
or not).
Design Principle
API Design
One of the principles in designing API is hiding of the internal implementation to user. In other words, we should avoid using implementation types and use the interface instead. There is a PMD
rule for this: LooseCoupling
. This rule can be applied to TEAMMATES production code.
We could run the check for TEAMMATES production code and get the violation report.
One of them is shown as below (From Logic.java
).
It would be a good practice to change HashMap<String, CourseSummaryBundle>
to Map<String, CourseSummaryBundle>
as this method is an API method.
Abstraction Level
According to the design documentation of TEAMMATES, there are several abstractions, which define how components should interact with one another. Some components may have dependency to each other while other should not touch each other. Currently, we use Macker
to check dependency and only two rules are forced. Apparently, there should be more rules.
1 | <ruleset name="Test cases should not be dependent on each other" /> |
CheckStyle
has one rule called ImportControl
(as its name suggests, we can make certain constraints for the import of certain class). However, it turns out that it is not as powerful as Macker
. The following discussion will propose new rules for Macker
.
We will examine the design one by one.
UI should not touch Storage
Logic should not touch UI
Storage should not touch Logic
Storage should not touch UI
Common should not have dependencies to any packages except storage::entity
Client scripts should be self-contained
Client should not touch UI
client::remoteapi should not depend on client::scripts
client::util should not have dependencies to anything within the client package
- Only *Action can touch Logic API
- Templates are only used in Pagedata
- Pagedata is only used in Controller
- Controller should be self-contained
- Automated actions should be self-contained
- Automated actions should not have dependencies to anything within the UI component
- Logic classes can only access storage::api
- Each logic can only access its corresponding DB (e.g. AccountsLogic -> AccountsDb)
- LogicComponent: BackDoorServlet should not access logic::core/logic::api
- logic::core should not access logic::api
- logic::api/logic::core should not access logic::backdoor
- storage::search should not touch storage::entity
- storage::entity should not depend on anything within the storage component
- Test cases should not be dependent on each other
- Only UI tests can access page object classes
- Only certain test cases can test storage:entity
- Only certain test cases can access GaeSimulation/BackDoorLogic
- BackDoor test driver is only for browser test
Architectural rules for external API calls
- Java Logging API can only be accessed via Logger
- Search API can only be accessed via storage::api and storage::search
- Java Persistence API can only be applied to storage::entity
- Google Cloud Storage API can only be accessed via GoogleCloudStorageHelper
- Task Queue API can only be accessed via TaskQueuesLogic
- Remote API can only be accessed via RemoteApiClient
- JDO API can only be accessed via storage::api and client scripts
- Servlet API can only be accessed via Servlet classes, GaeSimulation, and selected utility classes
UPDATED: The rule is enforced in TEAMMATES since e587b4300bf06afe8c4eb322381f3eb29636683c.
Bug Prevention
Findbugs
Enforcement
Findbugs is a powerful static analysis tool as it can examine binary code rather than source code of Java. Therefore, it can find more “bugs” than PMD
and CheckStyle
.
Findbugs doesn’t support printing violations in console and hence it has been excluded from TEAMMATES. There are only two options for error reporting: html
and xml
. The html
report of FindBugs
has 151 violations and some of them are valid.
This is what it looks like when run ./gradlew findbugsMain
1 | :compileJava UP-TO-DATE |
Printing violations in console is essential for a CI managed project. One possible solution for this is to make findbugs
generate xml
report and write a separate task in Gradle
to print the report in console.
1 | task printFindbugsMainResults << { |
Below is the violation report printed by the task printFindbugsMainResults
.
1 | $ ./gradlew findbugsMain |
Through this, we can integrate findbugs
static analysis tools into TEAMMATES. The developer can run it locally and the CI can run it automatically.
Note that we use filter
to suppress some bugs in FindBugs
. More filters should be added if we confirm they are not bugs.
Usage of Assumption.java
Currently, there exists inconsistency in using of Assumption
and RuntimeException
in production code in TEAMMATES.
Assumption
:
RuntimeException
:
In either way, if the condition fails, the application will be stopped and an error email will be sent to admin.
There is a rule in PMD
named AvoidThrowingRawExceptionTypes
and it indicates that throwing raw exception types is not a practice in software engineering. We may want to use Assumption.assert*
rather than throwing raw exceptions in the production code. Therefore, we should change all RuntimeException
to make use of Assumption
class for production code.
When applying the PMD
rule AvoidThrowingRawExceptionTypes
, the report indicated there are 11 violations for production code. Some of them even have // TODO
tags to ask the programmers to replace the RuntimeException
with Assumption.assert*
.
For test code, we could be more tolerant as we might use RuntimeException
to indicate errors.
It is possible to replace the RuntimeException
with fail()
method. There are only 14 violations for test code by PMD
report.
Analysis
Existing Static Analysis Rule
Through my observation in TEAMMATES’s PR process, the existing static analysis rules work well and detect most of potential issues in PRs effectively. Most of time, there is no need for reviewers to point out stylistic issue. Therefore, we could keep the existing static analysis rule.
Additional Static Analysis Rule
When dealing with new rules, we must be very careful as the side effects of some of the new rules may be outweigh their benefits. Through the above discussion, we can see that some rules are worth enforcing, while others are not in consideration to the amount of effort needed to enforce them. We will looks at them individually.
Coding Standard
It is not worth to enforce the rule for boolean variable naming convention. The intention of the coding standard is to make code more readable. However, the customised check just simply checks whether the boolean variables start with certain prefixes. There are many case that a boolean variable doesn’t start with
is
but makes sense. However, we can keep the checks and run it frequently to check possible violations.It is not worth to enforce the variable declaration usage distance rule. We could find some valid cases to violate this rule. For example, we may want to get all parameters in a controller at first rather than getting them when using them. This practice helps developers know which parameters the controller will use. Besides, for test code, as we have discussed in the previous section, we may want to declare all test data at the beginning. However, this checks could be used to detect violations such as initialising variable to
null
.The
Comments Indentation
rule is already forced in current production code.It is not worth to enforce the spelling of words rule in TEAMMATES. The reason is obviously as the maintenance would be a nightmare. We invent new words everyday, aren’t we? However, this check could be run frequently to check whether there are typos or not.
Design Principle
It is worth enforcing the
PMD
ruleLooseCoupling
, which could make our APIs more flexible and “avoid using implementation types and use the interface instead” is one of rule in API design.It is worth enforcing the new rule added to
Macker
to check any design flaws. It will prevent us from breaking the design accidentally.
Bug Prevention
It is worth enforcing the
Findbugs
plugins in our build process with suppression. FindBug may find “bugs” thatCheckStyle
andPMD
could not find. We now also have the ability to print violations in console.It is wroth enforcing the
AvoidThrowingRawExceptionTypes
rule to give a good software engineering practice. We could be consistent that we useAssumption
to detect bugs and useException
to deal with exceptions.
Conclusion
In conclusion, we should keep the existing static analysis rules and it is worth enforcing most of the additional rules that are discussed above for TEAMMATES. In addition, the not worthy ones could also be an option to find potential stylistic flaws, but we need to put efforts to maintain them.
We are almost done with static analysis tools for Java (at least with CheckStyle
, Findbugs
, Macker
and PMD
), further exploration could be done to make this process more automatic. For example, we can have PR-bot to comment violations. Also, for the spelling rule I have introduced or the boolean variable naming, the PR-bot could make comment in the PR but not necessarily to fail the build.
Original Post from https://xpdavid.github.io/CS2103R-Report/, which was writtern by me
Technical Report: Static Analysis (Java) in OSS Project TEAMMATES