Solving the last mile problem in data accessing and promoting environmental data science in South America
Data analysis is a discipline widely used in business; this involves the processing of large amounts of data to discover underlying information in them. Not long ago this technique was limited only to large companies because they already had the necessary data and infrastructure for this enterprise. Over time, with the improvement in access to internet, the increase in storage capacity and the publication of data by government agencies, allow a greater number of researchers could apply these tools in several disciplines.
While in the developed countries data analysis is at its best, in South America the situation is far from ideal, very few universities offer some degree in data analysis, added to this the low labor demand causes a brain drain throughout the region. This can be a big problem in a short time, if we take into account that the region has more than half of the tropical forests and a wide variety of ecosystems and that data analysis has become an indispensable tool to measure the impacts caused by climate change.
This proposal proposes the implementation of local agencies that address this problem in a systemic and proactive way, working in conjunction with government agencies, academic institutions and different groups of civil society.
C’SQUARE - The Urban Comfort Rating
Vieira, R., Gonzalez, A., Cardoso, A., Oliveira, D., Albano, R., Clementino, M., . . . Paranhos, R. (2008). Relationships between bacterial diversity and environmental variables in a tropical marine environment, Rio de Janeiro. Environmental Microbiology, 189-199. doi:10.1111/j.1462-2920.2007.01443.x
What actions do you propose?
"Wasting" is one of the most used verbs to describe global issues that have to be resolved to achieve sustainability. Not wasting food, not wasting water, not wasting energy, not wasting materials are clear rules that are handled every day in order to create environmental awareness. However, there is one more rule that should be added to collective thinking, not wasting data, data has become such an important resource in recent times and the existence of scientific data that is not efficiently used is unacceptable. The present proposal identifies two main problems to be solved:
Last mile data problem. - With the problem of the last mile in environmental data we refer to the difficulty that is generated in the process of data collection by the end user and is not a problem of data unavailable to public access. Web portals of the climatological and hydrological agencies of different countries of South America offer access to the data of all their monitoring stations. However, their interfaces usually only allow access to the information of each station one at a time from a predetermined period of time , forcing researchers who require information from a wide period of time from different stations, to make an overwhelming number of requests to the server for each station for each period of time. In other cases, it usually happens that the data are averaged or already processed, which is not usually the most desirable in the analysis of data, it is usually preferred raw data.
At the beginning of the movie "The Social Network (2010)", Mark Zuckerberg develops FaceMash subtracting the information of students from different sources, the information varied between each source and Mark has to use different methods to obtain it. The film demonstrates the genius of Mark having achieved it in a single night while other less talented people probably would have cost much more time.
This reference is used to denote the work that researchers have to do when collecting information from various sources. the availability of data varies widely between each source forcing each researcher to have to resort to their ingenuity to capture and transform these data to a format that is useful for their purposes, these efforts are repeated by all researchers and it is possible that many of them may not have enough computer skills to perform these tasks successfully.
Most of the proposals mentioned above share similarities in terms of the type of data they use as input, meteorological data, biodiversity data, GIS data. If all these data were available from their sources in different formats, each format oriented for an ideal use with analysis tools, many efforts among researchers would be saved and barriers to entry for new researchers would be reduced.
Lack of a culture of decisions based on data in the region. - Although it could be said that this is not a unique problem in this region, in South America a gross gap has been generated between people who know how to correctly read statistical information and those who do not, this generates disastrous consequences for societies when conflicts escalate larger dimensions due to misinformation or misinterpretation of the data.
In response to these problems, the development of local-scale organizations that interact with local government agencies, academic institutions and civil society groups is proposed. The objectives set of this organization is:
- Work in conjunction with sources that generate scientific and statistical data to maximize the impact of their work.
- Follow up on the evolution of techniques and tools related to data analysis.
- Advise investigations that involve data analysis.
- Encourage the formation of communities specialized in particular data
- Promote a culture of decision-making based on statistical data.
- Replicate efforts throughout the region.
Found NGO in Bolivia. - Legal registration of the non-governmental organization based in the city of Santa Cruz, Bolivia.
Sign cooperation agreements with institutions. - Four institutions are identified with which to work initially, the national service of meteorology and hydrology of Bolivia (SENAMHI), the national institute of statistics (INE), the air quality monitoring network (MONICA) and the natural history museum noel kempff mecado (MHNNKM). This selection is made based on the magnitude of the data generated by these agencies and because in two cases the data is already accessible through their web portals. SENAHMI and the INE have their portals where they publish their data, but as mentioned above, these can only be consulted individually, which leaves an overwhelming task for each researcher who wants to download various types of data of ample period of time, having to resort to hundreds of requests to the server to achieve it.
The initial idea with the agreements is to receive an authorization from these institutions, ideally written, for the collection of data from their web portals to transform them and through the organization's own website, to offer them in an ideal format (CSV file type) for download and use with data analysis tools by the public. The collection methods will be initially manual until automatic scripts can be developed to automate the task because the publication of certain data from these institutions is usually daily. This way of working is conceived to minimize the work involved of the institutions and increase their initial willingness to cooperate with the present organization.
In the case of the natural history museum Noel Kempff Mercado (MHNNKM), it is one of the largest biodiversity information centers in the region, besides being the place where most of the new research on biodiversity in the Bolivian east begins and ends. The institution has recently undertaken efforts to carry out the digitization and publication of its data, a problem that the institution has been dragging for some time. The objectives set by this organization coincide with the efforts undertaken by the museum making this moment the most appropriate to carry out joint work.
Develop presentations and seminars. - These pedagogical activities will be initially intended for the introduction in the basic concepts of data analysis, these presentations will be adapted according to the target audience and it is intended to reach the greatest variety of knowledge areas possible. These seminars will be held in various university faculties, each adapted to the application of data analysis tools in each academic discipline. These activities will identify people who are carrying out projects or research in this field in order to offer advice and follow-up on their work.
Develop ETL guides.- This refers to the tasks of extraction, transformation and loading (ETL), ETL guides identify the sources of data that exist (institutions, government agencies) and how they are offered, digitally or analogically, web address, file types, etc. . In addition to this, we would also indicate how to transform this data in the most efficient way for its use with the different data analysis tools. These guides can be quite useful for those researchers with little previous experience with ICTs or who are not part of the region, given that any possible help could make a difference for their projects.
Web portal development. - In addition to the general information of the NGO, the web portal will contain the data in an ideal format for data analysis, ETL guides, and the projects and research carried out in the region. It should be noted that the spirit of the portal is not to be a centralizer of data, ideally, each source of data could offer as an option, from their own portals, but not to be stuck in an initial situation; it seeks to minimize the work of the institutions allies.
Search for local financing. - Once settled, the organization will seek funding from local institutions to ensure the continuity of the project, there are conditions for this option to be viable, either by the local government or by the sector private . The organization will also seek self-sustainability through the offer of intensive courses in the management of data analysis software.
Replicate efforts throughout the region. - Given the strategic position, after covering all of Bolivia, the organization will seek to replicate the efforts in neighboring countries through the promotion of the formation of local communities of data analysts with the support of the present organization.
Who will take these actions and which types of actors are involved?
The actions will be led by the author of the present proposal, taking care of all the activities that involve the organization. The key actors involved, the data generating institutions, among these four initially choose to work in cooperation, the national service of meteorology and hydrology of Bolivia SENAHMI, the national institute of statistics INE, the MONICA air quality monitoring network and the Noel Kempff Mercado Natural History Museum MHNNKM. Subsequently, it aims to work with Geobolivia and the Primary healthcare system, for which the support of the local municipal government is important.
Universities are important because they will be the main beneficiaries of the project. The local municipal government and the private sector are also necessary to continue the project.
Where will these actions be taken and how could they scale?
The proposal has as its initial location the city of Santa Cruz de la Sierra, in the country of Bolivia. From there, it is sought to have national reach and with time and the formation of solid communities, to expand the reach to the neighboring countries in the region.
In environmental terms, Bolivia is a country with high rates of biodiversity, both in fauna and flora; it is also a territory of convergence between three well-differentiated regions, the Andean, the Amazonian and the Great Chaco, this uniqueness makes Bolivia a region with a high diversity of ecoregions.
All these characteristics facilitate the replication of the efforts made in this project in neighboring countries.
In addition, specify the countries where these actions will be taken.
What impact will these actions have on reducing greenhouse gas emissions and/or adapting to climate change?
The proposal does not generate direct impacts in reducing the emission of greenhouse gases, nor does it provide direct solutions for climate adaptation, what this proposal does is facilitate the data necessary to those persons that want make that’s things.
For example, in “Absorbing Climate Impacts” contest of this platform, several proposals could benefit from the data provided by this proposal to calculate the risks of inclement weather in local crops and propose more effective insurance methods.
What are the most innovative aspects and main strengths of this approach?
More than innovation is a necessity; data analysis is the gateway to innovative technologies such as machine learning and artificial intelligence. In recent times, these technologies have changed the way of conceiving the future, allowing viable projects that do not they were. In the environmental field, these technologies have not yet broken out as they have done in other fields; this project tries to generate the necessary conditions for the emergence of innovative solutions in this field with these technologies.
What are the proposal’s projected costs?
The cost of the project is estimated at 10,000 dollars, these funds will be invested in the legal constitution of the NGO (3,000 dollars) and the salary of three people for a period of six months (7,000 dollars)
About the Authors
Orlando Andre Cuba Llanos
Environmental engineering student,
Santa Cruz, Bolivia
Vaisman , A., & Zimanyi, E. (2014). Data Warehouse Systems, Design and Implementation. Berlin: Springer. doi:10.1007/978-3-642-54655-6
What enabling environment would be required in order to implement this proposal?
The most necessary condition is the validation of the idea by an institutional authority, it is not an easy task for a new organization to be taken seriously by already consolidated institutions, an acknowledgment by an environmental authority such as The United Nations Environment Program could facilitate the work of closing cooperation agreements. Apart from this, it could be said that the rest of the necessary conditions for a successful development of the project are already given. In Bolivia there has been constant economic growth in recent years benefiting government agencies in achieving their goals (Link_1, Link_2), this growth has been reflected in the socioeconomic level of the population in the country (Link_3), making cities such as Santa Cruz de la Sierra the student population grow at a rapid pace and forcing the local government and the sector business take on the challenges of this new panorama (Link_4, Link_5, Link_6). But not all are positive things, the economic growth conceals high environmental costs under its figures (Link_7, Link_8, Link_9).
We are at a point of no return at the level of the nation and at the continent, if we do not begin to understand better what we have at this moment, it is quite probable that in the future it will no longer be possible to do so.