Kirjaston kokoelmat
Lisää
Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections. The National Library of Finland has created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort. In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows.
Manually indexing documents for subject-based access is a labour-intensive process. We propose using metadata gathered from bibliographic databases to train algorithms that assist librarians in that work. We have developed Annif, an open source tool and microservice for automated subject indexing. After training it with a subject vocabulary and existing metadata, Annif can be used to assign subject headings for new documents. We have tested Annif with different document collections including scientific papers, old scanned books and contemporary e-books, Q&A pairs from an “ask a librarian” service, Finnish Wikipedia, and the archives of a local newspaper. The results of analysing scientific papers and current books have been reassuring, while other types of documents have proved to be more challenging. The current version is based on a combination of existing natural language processing and machine learning tools. By combining multiple approaches and existing open source algorithms, Annif can build on the strengths of individual algorithms and adapt to different settings. With Annif, we expect to improve subject indexing and classification processes especially for electronic documents as well as collections that otherwise would not be indexed at all.
<p>In 2016, new goals were introduced into the negotiations on access rights to online academic journals. One of these goals was to advance Open Access publishing among Finnish researchers. Open Access publishing guarantees optimal visibility and further use of science and research, in addition to which it curbs the increase of publishing costs incurred by the academic community. </p> <p>In Finland, access rights to academic journals are negotiated by the FinELib consortium, a cooperative body of Finnish institutes of higher education, research institutes and public libraries. Finland is striving to negotiate models that enable transition to fully Open Access publishing as rapidly as possible. The goal for the transition period is to strike a deal by which, firstly, the academic community will gain access to a large number of academic publications at a reasonable price, and secondly, researchers will have an opportunity to publish their articles using Open Access options more easily and affordably than before. These goals are strongly supported by the academic community; in 2016, more than 2,700 researchers signed the Tiedonhinta.fi statement by which they pledged not to participate in the academic journals’ editorial boards or peer review processes unless these goals are met.</p> <p>FinELib successfully negotiated agreements to decrease the Open Access publishing costs of Finnish researchers with two publishers. However, consensus on Open Access publishing could not be reached with all publishers. Open Access publishing is the goal in future negotiations as well.</p>
This paper focuses on the process of multilingual concept scheme construction and the challenges involved. The paper addresses concrete challenges faced in the construction process and especially those related to equivalence between terms and concepts. The paper also briefly outlines the translation strategies developed during the process of concept scheme construction.<p>The analysis is based on experience acquired during the establishment of the Finnish thesaurus and ontology service Finto as well as the trilingual General Finnish Ontology YSO, both of which are being maintained and further developed at the National Library of Finland. <p>Although URIs can be considered language-independent, they do not render concept schemes and their construction free of language-related challenges. The fundamental issue with all the challenges faced is how to maintain consistency and predictability when the nature of language requires each concept to be treated individually. The key to such challenges is to recognise the function of the vocabulary and the needs of its intended users.<p> Open science increases the transparency of not only research products, but also metadata tools. Gaining a deeper understanding of the challenges involved in their construction is important for a great variety of users – e.g., indexers, vocabulary builders and information seekers. Today, multilingualism is an essential aspect at both the national and international information society level. <p>This paper draws on the practical challenges faced in concept scheme construction in a trilingual environment, with a focus on “concept scheme” as a translation and mapping unit.