Big data and data quality
Big Data is characterized by the volume and variety of data and the speed of management.
Geographic and Ecology, like many other subject areas, has been flooded by a huge amount of diverse data called Big Data. Though databases are increasingly accessible and affordable, data are becoming more voluminous because they are often georeferenced and include numerous samples (measurements) and in addition to attributes and associated variables (fields). Such data are obtained from increasingly large areas with a greater spatial, temporal and thematic detail. This avalanche of data offers great opportunities for research but also requires new approaches for managing it efficiently, rigorously and accurately, all depending on the particularities of associated thematic information.
Following its classic definition, Big Data is characterized by the volume and variety of data and the speed of management. However, we find it necessary to add an additional property: quality. The quality of the alphanumeric and spatial information of the available data must be analyzed. In the same regard, it is necessary to verify that access, maintenance and propagation of metadata is adequate and consistent. This is crucial when creating, editing and transforming the associated databases. Only after assuring the quality of the data used we can be confident that the corresponding related models are rigorous and accurate.
- Geospatial databases: We research optimal designs and formats for geospatial information data and metadata, including long time series of remote sensing images. Some examples include: computational optimization, studies onf the effects of lossy compression, and studies for data preservation, among others.
- Open Data: We incorporate and promote open and participatory science initiatives, especially in terms of open access to data and metadata.
- Standardization of geospatial quality information: We implement innovative tools for viewing and searching for information about the Earth, prioritizing quality information quality and the use of suitable adequate standards.
- Genomics: We use Big Data for the genomic characterization of functional traits.
- Forest inventories: We obtain information and develop querry tools for over 90,000 georeferenced forest monitoring parcels, with information on changes in structure and composition every 10-15 years.
- Big Data for stoichiometric ecology, ecometabolomics, and functional ecology: We develop mathematical models that relate concentrations, contents and their stoichiometric relations with ecosystem function and structure, and in particular with regards to carbon fluxes, changes in species and diversity, and their relationships with the main components of global change.
- Integrated and scalable surveillance and control systems: We develop systems for the management, surveillance, control and study of mosquitoes carrying diseases such as dengue, chikungunya and Zika around the world.