Physikalisches Institut Heidelberg

by author

by title

by keyword

LHCb Analysis Preservation Roadmap

Sebastian Neubert

and 9 more

November 21, 2016

MOTIVATIONS FOR ANALYSIS PRESERVATION AND REPRODUCIBILITY The reproducibility of published results is one of the cornerstones of the scientific method: if a group of scientists have discovered something genuinely new about the world, another group should be able to follow their methodology and reproduce the result.[1] This fact is generally acknowledged by both individual scientists and funding agencies. Indeed the ability or otherwise of scientists to reproduce each other’s work is itself an active field of research . These considerations apply to High Energy Physics (HEP) as much as to any other field. However, some of the specific features of HEP mean that reproducibility in HEP takes on significantly different forms to reproducibility in many other fields. First of all, the scale, cost, and construction time of HEP experiments means that it is rarely possible for a new group of physicists to build a dedicated experiment simply to reproduce an existing result, however interesting it may be. Secondly, the size of HEP collaborations means that there are generally several quasi-independent[2] groups of scientists working on any given analysis, and as part of this work naturally trying to reproduce and improve upon each other’s work. For these reasons, analysis reproducibility in the HEP context has certain specific features, and addresses several knowledge-transfer problems that occur at very different time-scales. In particular, because the detector and analysis documentation internal to the collaboration can stretch to thousands of pages, it is not possible to fully describe HEP analyses in even the most verbose journal paper in a way which would allow them to be reproduced later. For this reason, analysis reproducibility in HEP is inextricably linked to the _preservation_ of analysis datasets and code, in a way which is not true of many other scientific fields. The problems addressed by analysis preservation and reproducibility in HEP are Collaborative working: The use of an integrated analysis preservation framework allows all members of the analysis team to have access to the full software suite. This encourages continuous development and integration, gaining time by not having to “wait” for other analysts to have finished their contribution. A significant benefit will be that new students will get running examples of analysis code when they join an analysis or continue work from their predecessors. This will improve their learning experience and get them faster into a productive state. Knowledge preservation during review: On several occasions, changes requested by reviewers caused significant problems as part of the analysis procedures have been lost. The delay in the Δmd analysis between the conference note and the paper was an example. Knowledge transfer to other analysis teams: When updating an analysis, the first task is to restore the analysis framework of the previous iteration. This should be painless, but often is not. Considerable time was for instance wasted re-implementing Bs → μ+μ− analysis software of Ref. when preparing for Ref. . The same applies to groups performing similar analyses. Many effort is spent in LHCb re-implementing tools that could be shared. This should be particularly encouraged for analyses where a control (or signal) mode is shared. It is not rare that comparing results that appear inconsistent is more complicated than it should. The comparison of Refs. and on B → K+π−μ+μ− decays is a recent example resulting in the need to publish an erratum . Knowledge transfer to “future generations”: While this is the initial motivation of analysis preservation, we have no LHCb experience yet. Yet there are many examples in the past of analyses that were difficult to replicate. In some cases, the use of RECAST or similar frameworks allowing tests of new models is also desirable. Having introduced the problem, Sec. [sec:preservation] will now define the scope of what we mean by analysis preservation, before Sec. [sec:cernanapresinfra] introduces the CERN infrastructure dedicated to analysis preservation and reproducibility. Section [sec:roadmap] presents a set of best practices for analysis preservation. We also point out the benefits an analysis team gains from the adoption of those practices for their day-to-day business during the development and review of an analysis. Finally Sec. [sec:techniques] contains recommendations, which technologies and tools should be used to implement the outlined practices. These recommendations are based on an extensive evaluation of different technologies by representatives of the physics working groups and take into account the global CERN strategy for analysis preservation. [1] Here and throughout we are primarily concerned with the reproducibility of experimental results, although there are theoretical domains, particularly those involving large-scale computing like Lattice QCD, where similar considerations may apply. [2] All work within HEP collaborations relies on shared analysis and detector calibration tools, and in this sense analysis groups inside a collaboration can never be truly independent of each other. There is always potential for shared mistaken assumptions to creep in and bias everyone’s work. On the other hand, the international nature of collaborations means that such analysis groups are generally not only fully independent in terms of funding, but have an active interest in finding problems in each other’s work in order to assert their own primacy in front of the outside world.

PHASE -- Panel on Hadronic Amplitudes

Sebastian Neubert

and 12 more

November 02, 2016

VISION AND GOALS The vision of PHASE is to enable joint analyses of hadronic processes across multiple experiments and multiple data sets. It provides a forum and infrastructure for joint developments within the hadron physics community and defines best practices. To make progress in the understanding of strong interaction phenomena, it is necessary to combine the information gathered by several experiments across a wide range of energies in a coherent fashion. Due to the complexity of hadron dynamics, this is a non-trivial problem. Current state-of-the-art techniques for extracting hadronic amplitudes from data exhibit model dependencies, which need to be quantified. Amplitude parameterisations have to be made consitent across different experiments. Only then will the envisioned global analyses become possible. One of the main reasons for the difficulties currently experienced is, that in hadronic decays often many processes are coupled together. For a comprehensive treatment, all of them need to be taken into account, while each analysis typically only investigates a single final state. It is at this point that combining data from different experiments will have the biggest impact, as now each channel can be constrained from the experimental data, which is most sensitive to that particular hadronic system. PHASE will bring together expertise and data from both the baryon and meson sectors. To advance further, methods are being developed which will enable the extraction of universal features from the data. In addition, procedures for data handling across experiments are needed, which will allow the transfer of those universal features between different production processes and final states. Theorists have already developed the basic concepts how such universal features can be defined. However, considerable technical challenges have so far prohibited the full implementation of these ideas into the data analysis frameworks of the current generation of experiments. The outstanding challenges and possible angles of attack have been discussed and documented in a series of workshops during the past few years . It is now a consensus in the community that to overcome said challenges, a closer cooperation between phenomenologists and experimentalists is needed. The first and foremost goal of the PHASE network is to facilitate this cooperation. The activities carried out in the project proposed here aim at providing researchers with useful tools for data analysis. They will produce actual software implementations of amplitude models. The focus will be on advanced models that cannot be produced by a single collaboration but need combined expertise from several actors. PHASE puts the software implementation of the physics models into the center of its activities. Its goal is to advance complex analysis projects from the conceptual to the actual.