Integration of Machine Learning And CitizenScience To Address The Challenges of PublicEngagement and Data Validation

Ingénierie et Architecture

Maryam Lotfian

The number of citizen science (CS) projects has grown significantly in recent years, owing to techno-logical advancements. One important aspect of ensuring the success of a CS project is to consider andaddress the challenges in this field. Two of the main challenges in CS projects are sustaining partici-pation and improving the quality of contributed data. Despite the studies that have been conductedto address these two challenges, there is still a need for new approaches, one of which is the use ofartificial intelligence (AI) and machine learning (ML) in CS projects.

Therefore, the objective of this thesis was to investigate the integration of ML and CS, as well as therole of this integration in addressing CS challenges. A comprehensive review conducted in this studyof motivational factors in CS projects indicated that interest in learning about science and receivingfeedback were strong motivations among participants in the majority of CS projects. Typically, expertsverify the data and provide feedback to participants. However, due to large amounts of data, thismanual data verification can be time-consuming. Thus, in this research, it was investigated how theintegration of ML and CS can, on the one hand, automate and speed up the data validation process,and on the other hand, increase participant engagement and sustain participation by providing real-time informative feedback.

To that end, a biodiversity CS project was implemented with the goal of collecting and automaticallyvalidating observations as well as providing participants with real-time feedback. ML algorithms weretrained to model species distribution using environmental variables (e.g., land cover) and species datafrom an existing CS project, and then to validate a new contributed observation based on the likelihoodof observing a species in a specific location. Furthermore, volunteers were given real-time feedback onthe likelihood of observing a species in a particular location, as well as species habitat characteristics.Moreover, a user experiment was conducted, and the results indicated that participants with a highernumber of contributions found the real-time feedback to be more useful in learning about biodiversityand stated that it increased their motivation to contribute to the project. Besides that, as a resultof automatic data validation, only 10% of observations were flagged for expert verification, resultingin a faster validation process and improved data quality by combining human and machine power.Finally, based on the findings of the experiments and the discussions that followed, we made somerecommendations for CS practitioners to consider before designing a new project or improving anexisting one.

The future objective of this research is to focus more on the challenges of ML and CS integration, andto investigate how this integration can be applied in other CS fields besides biodiversity.