Decision trees made easier
Our newest R package andorR (pronounced “Andorra”) is up on CRAN https://cran.r-project.org/package=andorR and GitHub https://epimundi.github.io/andorR/ and within a few months has been downloaded by over 1000 users!
In our work supporting government policy decisions, developing surveillance case definitions, or analysing diagnostic pathways for complex diseases, we frequently use AND-OR decision trees. The World Organisation for Animal Health criteria for listing a disease (see image) provides a simple example, but they can become much more complex.
AND-OR decision trees are made up of a set of questions (TRUE or FALSE) linked by decision nodes with AND or OR rules which combine in a hierarchical tree structure to generate an overall conclusion. The purpose of these trees is to provide a transparent, repeatable process that takes multiple factors into account to arrive at an unambiguous decision.
The problem is that it is not always easy to answer all the questions. One tree we have been working on has over 40 different highly technical questions, combined with complex multi-level AND and OR decision nodes.
We have developed the andorR R package to make the decision process easier and more efficient. The package addresses two key areas: confidence and optimisation.
Users are required to provide an estimate of the level of confidence they have in each of their responses. By entering the confidence of each question, the tool can calculate the overall confidence in the conclusion. When faced with a series of challenging questions, users can quickly provide ‘best guess’ answers to come up with an initial (low confidence) conclusion. They can then invest more resources in the key questions to ensure that decisions are made based on robust evidence.
andorR also optimises the decision process in two ways. In AND-OR trees, not all questions need to be answered. Often a conclusion can be reached based on the response of only a small subset of questions. The first optimisation tool guides the user through the process, identifying which answers will have the greatest influence on reaching a conclusion, based on those questions already answered and the response provided. This saves time, as well as resources, as there is often no need to research a question that has no impact on the final result.
The second optimisation tool helps users decide where best to invest resources. If the confidence in some (or all) questions is initially low, reaching an agreed confidence threshold (for example > 80%) will require stronger evidence to be provided. andorR performs a sensitivity analysis to determine which questions would have the greatest impact on improving overall confidence, if they were able to be answered based on stronger evidence.
The package includes a series of demonstration data sets ranging from BSE case definitions to UNESCO listing, the diagnosis of multiple sclerosis, and making ethical investments.
Our R package andorR (pronounced “Andorra”) is up on CRAN https://cran.r-project.org/package=andorR and GitHub https://epimundi.github.io/andorR/
