Here we go again! Following successful project implementation during GSOC 2019, the Hellenic OCR Team and its partners have proposed two new exciting software development projects to org admin GFOSS! Interested students are kindly requested to express their interest at any time.
Project A: Implementation of an advanced tool for linguistic analysis
Having Xtralingua, the end-product of GSOC 2019, as a starting point, the proposed project takes its development into the next level by increasing its modularity and introducing an end-to-end workflow that includes one of the most important aspects of modern text mining, i.e. data visualization. Xtralingua uses text as input and calculates a great number of linguistic features as output. With the proposed new GSOC 2020 project it shall get a modular architecture, the UI will be separated from the API layer, while also being able to support additional features. The core of the project shall be the development of an integrated data visualization tool with a defined output of a number of standard visualizations necessary for text analysis, e.g. word cloud, word frequency plots, topic modeling bubbles, sentiment evolution graph, etc. The added value of this tool to the linguistic community is invaluable since most of the existing tools are black boxes with little to none parametrization possibilities. The end-to-end characteristic shall give full control to non-developers while also enabling tool expansion and transformation with ease based on the proposed modular architecture.
By the end of GSOC 2020, software development is expected to have produced:
- End-to-end text processing and visualization tool
- Modular architecture enabling streamlined development and extension processes
- Implementation of defined connectors for add-ons
- Use of popular web technology stack to encourage community participation and use
On the technical level, we encourage interested students with basic knowledge of web technologies to apply (react, node). The interested developers will learn best practices in micro-services architecture and they will acquire experience in the use of state-of-the-art development tools and technology stack.
Students shall work with both professional developers and academic scholars, thus having the unique opportunity to take part in the development of an end-to-end tool of high significance for a wide community of end-users.
Project B: HyperFlow, Workflow management tool for digital transformation in legal tech
At the core of any digital transformation activity is the digitization of the underlying workflows and business processes. Although there are several professional tools available, a transparent and open source (and thus configurable) tool for managing complex organizational workflows is yet to be developed. The proposed tool leads to the development of an integrated SW platform upon which automated or semi-automated processes can be built and managed. The platform architecture includes integration of several components in the form of micro-services that can be chained together thus forming the desired workflow. For the GSoC 2020, the basic functionality of the desired platform shall lead to a proof of concept to digitally transform critical aspects of a parliamentary control process in the Hellenic Parliament. The relevant process shall be designed to enable decentralized workflow management, monitoring and visualizing of data flow, and potentially integration of further services such as Optical Character Recognition (OCR) via dedicated APIs. The end-product of this GSoC 2020 project shall be of particular value to developers in the digital governance regime.
By the end of GSoC 2020, the student is expected to have developed:
- Basic platform functionality for the mentioned case study
- Enabling of aggregation of different data formats and modules
- Connectors to existing and future open source scripts, e.g. for data scraping and text processing
- State management implementation to facilitate workflow management and configuration
The student is expected to have basic knowledge of the pertinent technology stack that includes NodeJS, TypeScript, ReactJS & MongoDB. Knowledge of Python shall be an advantage. An initial understanding of parliamentary processes is considered an asset but not required.
The mentioned parliamentary process is dealing with written questions by Members of Parliaments (MPs). The student will have the opportunity to get to know a well-established parliamentary process before dealing with the architectural aspect of platform design. A set of contemporary technology features such as enterprise integration patterns & microservices shall be integrated into the design.