Interdisciplinary science at its best – Be part of it!
On the eve of 2018 a new crowdsourcing platform came to life. The Hellenic Optical Character Recognition (OCR) Team represents the first scientific crowdsourcing initiative that aims exclusively at the processing and study of parliamentary data. Naturally, the application, handling and further development of OCR processes stand in the core of our endeavor.
This is an informal but powerful initiative that builds on the simple idea that a decentralized group of people can be more than the mere addition of individuals. We are a dedicated and rapidly expanding circle of scientists from a wide spectrum of disciplines, which includes academics, officials and students from several institutions, such as the Hellenic Parliament, the National and Kapodistrian University of Athens and the Democritus University of Thrace.
Every newcomer receives initial basic training at entering the group, while more experienced members, called ‘mentors’, provide peer-to-peer advice and support. Team members process the parliamentary texts assigned to them at their convenience and at their own pace. The finished text units, called ‘packages’, pass through a first degree quality proof step by the mentors and are pipelined for scientific examination. Text processing follows a well-defined streamlined process that was developed in order to build quality corpora of parliamentary relevance. Only publicly available textual data from acknowledged sources are used.
The substantive opportunities that arise from the study of the aforementioned corpora are tremendous. The digital content is available in an open and structured format, such as XML (eXtensible Markup Language), and enables the use of novel tools and methods from the exciting field of computational linguistics. Moreover, availability of unified and verified corpora allows for interlinking of several -former distant- areas of research, e.g. history, political science, linguistics etc., thus opening up new horizons in the understanding of parliamentary information and discourse.
We are looking for interested individuals from any background to further expand our dynamic interdisciplinary team.
We need you!
We offer training, a platform for personal and scientific development and the opportunity to work for a higher cause in a state-of-the-art scientific field. And it gets even better: all of that is for free! Hence, we look forward to welcoming you in one of our next meetings!
The Hellenic OCR team plans to organize an open Hackathon in 2018, as a next step to make our young and dynamic team acquainted with new methods, strengthen the bonds between our members and foster their commitment to our common goals. If you are still not convinced, see here 7 reasons why you should go to a hackathon!
Contact
Prof. Giorgos Mikros, Department of Italian Language and Literature, National and Kapodistrian University of Athens, gmikros@gmail.com
Dr Fotis Fitsilis, Head of Department, Scientific Documentation and supervision, Scientific Service, Hellenic Parliament, fitsilisf@parliament.gr
References
Mikros G. K. and Carayannis G., 2000, ‘Modern Greek corpus taxonomy’ in Gavrilidou M., Carayannis G., Markantonatou S., Piperdis S. and Stainhaouer G. (Eds.), LREC Proceedings, 31 May – 2 June 2000, Athens, Vol. I, ELRA, Paris: 129–134
Kouklakis G., Mikros G. K., Markopoulos G. and Koutsis I., 2007, ‘Corpus Manager: A tool for multilingual corpus analysis’ in Matthew D., Rayson P., Hunston S. and Danielsson P. (Eds.), CL2007 Proceedings, 27-30 July 2007, Birmingham, UK. Retrieved from http://ucrel.lancs.ac.uk/publications/CL2007/paper/244_Paper.pdf
Fitsilis, F. and Bayiokos, V., 2017, ‘Implementing structured public access to the legal reports on bills and law proposals of the Scientific Service of the Hellenic Parliament, Greece’ in Knowledge Management for Development Journal, 13(2): 63 – 80. Retrieved from http://journal.km4dev.org/index.php/km4dj/article/view/352/439
Fitsilis, F., Saalfeld, T. and Schwemmer, C., 2017, ‘Content Reconstruction of Parliamentary Questions’ in SCIECONF Proceedings, 26-30 June 2017, 5(1): 107 – 112. Retrieved from http://www.scieconf.com/archive/?vid=1&aid=2&kid=90501-448