|| Checking for direct PDF access through Ovid
This article defines a Framework for Machine Translation Evaluation (FEMTI) which relates the quality model used to evaluate a machine translation system to the purpose and context of the system. Our proposal attempts to put together, into a coherent picture, previous attempts to structure a domain characterised by overall complexity and local difficulties. In this article, we first summarise these attempts, then present an overview of the ISO/IEC guidelines for software evaluation (ISO/IEC 9126 and ISO/IEC 14598). As an application of these guidelines to machine translation software, we introduce FEMTI, a framework that is made of two interrelated classifications or taxonomies. The first classification enables evaluators to define an intended context of use, while the links to the second classification generate a relevant quality model (quality characteristics and metrics) for the respective context. The second classification provides definitions of various metrics used by the community. Further on, as part of ongoing, long-term research, we explain how metrics are analyzed, first from the general point of view of “meta-evaluation”, then focusing on examples. Finally, we show how consensus towards the present framework is sought for, and how feedback from the community is taken into account in the FEMTI life-cycle.