20 Feb 2015
Finding Ocean Health In A Sea of Data
Ever wonder how the Ocean Health Index gets data for its global assessment of the entire ocean? There's more to it than you'd guess!
Illustration of Global Ocean Observation System. Credit: Artwork by Glynn Gorick
produced for UNESCO Intergovernmental Oceonographic Commission (IOC)-Global Ocean Observing System (GOOS).
The world is swimming in data of all kinds, so how do you cut through to
get just the data you need? The secret is asking the right questions.
Basic Questions
Question #1 for our team was, “What is ocean health?” The answer we developed: “A healthy ocean sustainably delivers a range
of benefits to people now and in the future.”
That answer spawned more questions: What benefits should we assess? What reference points (targets) could we use to indicate whether the
flow of benefits is sustainable? What is the relative importance (weight) of
the different benefits and components? What are the best methods to calculate
scores?
Underlying everything was the question, ‘What data exist and how can we find
them?’. Here we’ll discuss some of the challenges
and surprises we encountered. Elsewhere you can read about how we selected benefits,
chose reference points and developed methods for calculating scores.
Sea of Data
The prominent scientific journal Nature
estimated that 10 million researchers worldwide devote about 26 billion hours
per year on research and development, resulting in about 920,000 scientific
articles---and that probably does not count the work of economists, social
scientists and other data producers.
Science
in 2015. Credit: Nature. RightsLink
Copyright Clearance Center License 3558241406342
But all that data streaming in to databases all over the world hasn’t yet
made it easy to find the information you want. Woods Hole Oceanographic Institution scientist, Peter
Wiebe, described data centers as ‘…kind of black holes. It’s very hard to figure out what’s in there
and how to get it out.’
Marcia McNutt,
editor of the influential Science
magazine, recently highlighted the need to improve data ‘discoverability.’
Data usually must answer the basic questions that journalists ask when
writing a story: Who, What, When, Where and Why, but in somewhat different
ways. In this case ‘Who’ names the
source of the data; ‘What’ is the measurement itself, along with narrative
‘meta-data’ providing additional information on the units of measurement etc., ‘When’ is the time and date of the measurement; and ‘Where’ is the latitude,
longitude, depth and other information on the location of the measurement or
sample.
Then there’s ‘Why.’ There are
dozens of reasons why data may be gathered ranging from personal interest to
mandated collection, and also including availability of funding, geographical
convenience, and many others. Since data
may have been collected for a very different reason than why one wants to use
it, it may be more or less detailed or useful than desired, but it will have to
do.
Security agencies sift enormous streams of data trying to detect and
prevent acts of terrorism and businesses analyze
the social media use and purchasing history of consumers to hone product development,
advertising and sales, identifying strategies and, if they
wish, things as specific as ‘the countries visited, movies downloaded and
favorite colors of all left-handed men with Facebook accounts who surf.’
Data discovery and integration in the ocean sciences isn’t yet that
sophisticated, partly because of variability in the type and qualities of data
as well as the variety of things different people want to find out.
For example, global scale data for variables such as sea surface
temperature, sea-ice extent, sea level rise or the extent of coastal mangrove
forests now come primarily from satellites that cover the planet relatively seamlessly, all making measurements
in the same way. Large scale arrays of
buoy systems provide similar in-water coverage. The data are easy to get if you know where to look.
Map of devices reporting ocean data to Planet
OS and its Marinexplore project. Image by Kristian Paljasma, Planet OS, used
with permission.
Data for other integrative topics, such as fisheries landings, mariculture production, economic indicators (jobs, wages, revenue) and others are gathered by agencies in different countries then submitted to international or multi-national groups for aggregation. The data are accessible, but vary in quality and are often harder to interpret.
Still other data layers must be cobbled together using information contributed by many different investigators. Examples used in the Ocean Health Index include extent and condition of seagrass beds, population assessment and extinction risks for iconic species, and many aspects of biodiversity.
Some desired data layers may not exist at all and must be approximated, either by mathematical or by using other directly or indirectly related data (proxy data). For example, there are no globally consistent data on the concentration of chemical pollutants in the ocean, so the Ocean Health Index bases its indicator on the amount of pesticides used by each country, as well as the distribution of commercial ship tracks and major ports and harbors, and mathematically models the flows from land into watersheds and rivers, then models the plumes of pollution diffusing from river mouths out into the ocean.
Similarly, data on pollution by pathogenic bacteria is lacking in most countries, so the Ocean Health Index uses proxy data from the World Health Organization and UNICEF on the percentage of the coastal population in each country that has access to adequate sanitary facilities.
As a final example, any data collected on the cultural, traditional and aesthetic importance of the ocean and coasts to a country’s citizens are not reported to a centralized international database, so those rather intangible benefits had to be represented by two proxy measures in the Sense of Place goal: the status of iconic species and the percentage of the coast and near-shore waters held in protective status.
Drawing on all these types of data and techniques, the Ocean Health Index assembled a list of more than 80 global data layers that are woven throughout its structure. Some are quantitative measures, such as the amount of fish caught or percentage of ocean protected; others are categorical, i.e. expressed not as measurements but as ranks or classes, as illustrated by IUCN Red List’s categories of extinction risk.
Categories of extinction risk for the IUCN Red List. Each species (or population) assessed is
classed into one of the categories shown.
Visualization credit: Encyclopedia Brittanica
online
You will soon be able to read more about data discovery as part of a
manual to help researchers carry out regional studies of ocean health in their
own countries. The manual will be posted
at www.ohi-science.org. Details on all data and data sources used by
the Ocean Health Index for global assessments are shown in Table S23 here,
Table S9 here
and in descriptive sections of those documents.
The difficulties associated with data availability, discovery and
quality control will be apparent to readers of those documents.
The Ocean Health Index’s data challenges are symptomatic of the fact that
learning about the ocean has neither seemed as urgent as national security threats
and nor as immediately financially rewarding as increased access to consumers. Growing awareness of the ocean’s importance is
beginning to change that. The Google Ocean platform
now visualizes data from many projects and expeditions, though it is not a data
source. The Geographic Information
Systems (GIS) company Esri, which provides
the maps on the Ocean Health Index Web site, is producing an ocean basemap to
incorporate many different marine data layers that will be useful for coastal
zone spatial planning. Those data will
be available by subscription.
A ‘big data’ company, Planet OS, has pioneered
a single platform for planetary data transformation, access and on-demand visualization. One of their products, Marinexplore,
already houses more than a trillion data points originating from 43,415
datastreams coming from 33 institutions, 41 projects and numerous ocean buoy
systems. It is relatively easy to find selected data within MarinExplore,
because the whole system is designed for that. One researcher’s discussion of how the system will benefit data
management for sound in the ocean environment is here.
As these and other new systems for gathering, archiving, and managing
data become available, data discovery for projects such as the Ocean Health
Index will become less difficult. As
regions at all scales agree upon the data and standard collection procedures
needed for the Index commit to gathering them on a regular schedule, quality and
usefulness of results will steadily improve.
Defining the Study Area
Mention of scores raises a final critical question: what is the ‘study
area’ of an assessment, the spatial scale at which final Index scores are
reported?
One option is to use the major oceans, such as the North Atlantic, South
Atlantic, Southern Ocean etc., as study areas. The United Nations World Ocean Assessment, now in development, uses
those areas as an organizing principle. Some
research and resource management organizations focus on particular oceans, but the
oceans do not have administrative units for comprehensive research and data
acquisition, let alone use of results for management. Therefore data for each ocean would have to be
assembled from research results gathered independently by scientists and
agencies working in many countries.
Another option could be to assess the oceans’ major ecosystems. Known as Large Marine Ecosystems (LMEs), each
has its own structure and dynamics and many are being studied as units. All encompass the coastlines of several
nations or states. None have
administrative units with regulatory powers.
Further candidates might be the 232
marine ecoregions into which the world’s coastal and continental shelf area
have been divided. Some of the
ecoregions include several nations or territories; in other cases a nation may
include several ecoregions. Lack of
administrative entities with regulatory power as well as transboundary issues
weighed against using these areas for the global Ocean Health Index.
Final biogeographic framework showing ecoregions, numbered as listed in
Box 1 in Spalding
et al. (2007). Credit: BioScience
by American Institute of Biological Sciences reproduced with permission via
Copyright Clearance Center.
In the end, two considerations persuaded the Ocean Health Index team to
use the coastlines and marine waters of nations and their territories as study
areas. First was the increased
likelihood of finding data. Nations frequently maintain for their own purposes
databases of information collected by their scientists, economists and others.
Second, nations usually have administrative units able to set policies or enact
and enforce regulations, actions necessary for maintaining or improving ocean
conditions.
Choice of this study area raised the question: ‘How far seaward and inland
should the study areas extend?’ Our seaward choice was 200 nautical miles, the
width of the Exclusive Economic Zone (EEZ).
The EEZ
concept was adopted as part of the U.N. Convention on the Law of the Sea
(UNCLOS) in 1982. That convention defines a country’s territorial waters as a band extending 12 nautical miles (22.2 km)
off the coast from some baseline, usually the low-water mark, within which it
retains exclusive sovereignty of the water, subsoil, seabed and airspace
above. A country may also claim a contiguous zone a further 12 nautical
miles seaward of the coast, within which it may enforce military, immigration,
sanitation and other rights. The exclusive
economic zone (EEZ) extends still further seaward, but not beyond 200
nautical miles (370.4 km) from the coast. Within the EEZ, the coastal state has sovereign rights to explore,
exploit, conserve and manage living and non-living natural resources of the
seabed and waters, as well as to produce energy from the water, currents and
winds. It can also establish and use
artificial islands or other structures for economic or marine scientific
purposes or to preserve the marine environment. However, within one country’s EEZ, other states still maintain the
traditional high seas freedoms, such as freedom of navigation and overflight
and the freedom to conduct military exercises. Additional information is shown here.
Unfortunately, use of EEZs as study areas does not solve
every problem. Even in the most information-rich
EEZs, data gaps still exist, so methods were required to fill holes within
existing data series or to estimate desired measurements using models or ‘proxy’
data.
Moreover, measurement of the entire EEZ is not appropriate for all goals
or data layers.
Exclusive Economic Zones (EEZs) of the world, along with
disputed areas (shown in red). Credit: Prof.
Jean-Paul Rodrigue, Dept. of Global Studies & Geography, Hofstra University.
Data source: VLIZ. Third party use
of this illustration requires permission of Prof. Rodrigue.
Look closely at the map above and you will see disputed EEZ
boundaries and areas off the coasts of Argentina, Peru/Chile, Colombia/Nicaragua,
Norway, Turkey, northern Japan/Russia,
South Korea/Japan, Taiwan (Province of China) and throughout the South China Sea. Some of these disputes are moving toward
resolution, others are not. The South
China Sea disputes illustrate the challenges to resolution. The area of the Chinese EEZ aggregated
by marineregions.org and used by the Ocean Health Index is 875,263.6 km2, an area
that includes waters out to 200 nm off the coasts of China and its two Special Administrative Regions, Hong Kong and Macau. However, China also claims an additional 3
million km2 of the South China
Sea, various portions of which are disputed by other countries in the region
including Vietnam, Philippines, South Korea, Japan, Malaysia, Brunei and
Singapore.
Such marine territorial disputes can
only be solved by international agreements in accordance with UNCLOS and resolution
is complicated and slow. Projects such
as the Ocean Health Index can only use agreed boundaries. The only publicly available source for
that information is the VLIZ database compiled by http://www.marineregions.org/ which
uses methods shown here. VLIZ adds corrections during its regular database
updates, so whenever countries resolve boundary disputes, their newly agreed
EEZ boundaries will become part of the updated database. In the meantime, VLIZ maintains
information on the disputed areas, as shown in the above map. Disputed areas are excluded from the scores
calculated by the Ocean Health Index. If/when new EEZ boundaries are determined, scores can easily be
recalculated.
Disputed regions create interpretational and practical
problems for the disputing countries. Though
the Ocean Health Index has no other option, posting of a score for an area
bounded in a way that a country does not accept can be politically uncomfortable.
Such a country might wish to carry out its own study using the Ocean
Health Index framework but self-defining its study area boundaries. The country would have or could obtain data for
the VLIZ defined portion of that area, but might have difficulty obtaining similar
information for any portion lying within the VLIZ-defined boundaries currently assigned
to other countries.
Take Home Message
These problems notwithstanding, as long as research provides a steady
flow of data points, each one describing ‘Who, What, When, Where and Why’ the
Ocean Health Index can accommodate any future alterations of EEZ boundaries or
other methodological changes. Improved systems
for archiving, managing, discovering and displaying those data will help this
and other projects point the way toward healthier oceans and healthier, more
sustainable societies.
The information contained in this article applies to all
of the goals and methods included in the Ocean Health Index. Thanks to Julia Stewart Lowndes and Courtney
Scarborough for helpful editorial comments.