Big health data: Australia's big potential
Big data has the potential to create big opportunities for Australia. A
recent estimate by Lateral Economics suggests that open government data could
contribute up to $25 billion per annum across the economy.
This analysis also suggests that Australian government held health-specific
data alone could account for an increase of $5.9 billion per annum.
Big data also creates opportunities for considerable savings to the
Australian health care system. Professor Fiona Stanley, Patron and former
director of the Telethon Kids Institute told the committee that significant
gains could be made with the health budget if government appropriately
harnessed linked health data. Professor Stanley suggested that linked data
could be used to reduce costly but ineffective clinical interventions, detect
and prevent harmful health outcomes through early intervention and also alert
regulators to fraud in the healthcare system.
These are just some of the potential benefits Australia may obtain if
the Australian Government and the States and Territories combined and fully
utilised their administrative datasets.
Over the last three years, Australian Public Service agencies have been
working together to promote a new approach to using and releasing datasets held
by the Australian Government.
On 7 December 2015 the Prime Minister, the Hon Mr Malcolm Turnbull MP
and the Minister for Industry, Innovation and Science, the Hon Mr Christopher
Pyne MP, launched the National Innovation and Science Agenda.
One of the agenda's key planks was for government to 'lead by example in the
way Government invests in and uses technology and data to deliver better
This announcement coincided with the release of the Public Sector Data
Management report and the Public Data Policy Statement.
The report and the statement are considered at paragraphs 2.50–2.56 below.
The committee has previously heard from the Population Health Research
Network (PHRN) in October 2014 about some of the challenges faced in
maintaining health data linkages and in encouraging custodians of health data
to be more open in releasing their data sets.
These and similar concerns from other witnesses prompted the committee to initiate
this current examination of issues relating to big data and data linkage.
This chapter will consider the meaning of data linking and the new
opportunities for Australia to harness the full benefits of big data and data
linkage. This will be considered having regard to the existing framework and
the government's recently announced data policies.
There are some key concepts that are important for this report. These
include: big data, data linkage, data custodianship, unit record level data and
data linkage keys.
The phrase 'big data' has been defined to mean 'high-volume,
high-velocity and/or high-variety information assets that demand
cost-effective, innovative forms of information processing for enhanced
insight, decision making, and process optimization'.
Examples of big health data include:
- analysing the Australian Childhood Immunisation Registry and all childhood
immunisation records in Western Australia and New South Wales, involving the
analysis of 1.8 million records;
- an analysis of unplanned hospital stays for Western Australian
seniors, requiring the linkage of 153 million digital records from six data
Data linking is the bringing together of two or more data sets to create
a new, richer data set.
By bringing together sets of data that were previously isolated, researchers,
clinicians and governments can deepen their understandings of the ways people
actually use the health care system. This has the potential to inform
government policy making and decisions about improving service delivery.
According to the National Statistics Service, data custodians are:
...agencies responsible for managing the use, disclosure and
protection of source data used in a statistical data integration project. Data
custodians collect and hold information on behalf of a data provider (defined
as an individual, household, business or other organisation which supplies data
either for statistical or administrative purposes). The role of data custodians
may also extend to producing source data, in addition to their role as a holder
For example the Department of Health is the custodian of the Medicare
Benefits Schedule data.
Unit record level data
A distinction needs to be made between individual unit records and
aggregated data. Aggregated data provides information about a population as a
whole and no individual can be identified from that data.
An example of aggregated data is the Census.
This can be contrast with unit record level data which, according to the
Australian Bureau of Statistics is:
...a file of responses to ABS surveys or censuses that have had
specific identifying information about persons and organisations
confidentialised. [The unit record level data files] contain very detailed
information for each individual record - a record can be a person, a business,
a family, household or a job for example.
For researchers that wish to understand the health system or are
interested in a particular pharmaceutical product, it is preferable to have de-identified
unit level records as Dr Merran Smith, Chief Executive of the PHRN explains:
Aggregated data is valuable and even linked aggregated data
is valuable. But it probably cannot do the sorts of things we are talking about
for the health/medical research that really needs the detail.
For that reason, researchers need access to de-identified unit record
level data to achieve the best result.
Data linkage key
A data linkage key is a code that is constructed to replace identifying
information, such as name, date of birth and address on a linked record in
order to protect the privacy of the subjects of the study. By using a linkage
key, researchers can link records that belong to the same person from multiple
datasets without needing to know who the person is.
Additional terms used in this report may be found in the Glossary.
Data is an important and valuable government resource. Data linking has
the capacity to maximise that resource and to create new opportunities for more
complex and expanded evidence-based policy and research.
Professor Stanley highlighted the benefits to government of using more linked
...[Australia] would be second to none in the world in enabling
us to evaluate all the outcomes of all [government] services that are provided.
[Australia] would be able to influence and evaluate evidence based practice; we
would be able to look at the epidemiological trends and risk factors of major
and costly problems.
In the medical sphere there are some shining examples of how data
linking has improved health outcomes. For instance, data linking has helped to
identify the role of folate in pregnancy in reducing neural tube defects, such
as spina bifida.
The Northern Territory Government facilitated 'a study that reviewed the
association between primary care utilisation and the number of hospital
admissions for the NT remote Aboriginal population'.
Linked data sets have also been used to 'estimate the prevalence of
dementia in the NT Aboriginal and non-Aboriginal populations' and analyse the
'cost effectiveness of primary care in the management of diabetes'.
The Commonwealth Scientific and Industrial Research Organisation (CSIRO)
has used linked data to create a Patient Admission Prediction Tool (PAPT) that
is helping to make hospitals more efficient.
The tool uses historical data from emergency departments and hospital data sets
to model the number of patients that are likely to present at the emergency
department and the numbers that are likely to require admission to wards. The
CSIRO notes that improved access to hospital datasets held by the Australian
Government would ensure that decisions could be made on the most comprehensive
Many witnesses argued that governments could facilitate a greater degree
of health data linkage, thereby releasing significant untapped opportunities.
For instance the Council of Academic Public Health Institutions Australia
(CAPHIA) noted that linking State and Australian Government datasets has:
...the potential for national, state and local comparative
effectiveness, clinical trials and registry research that has thus far been
largely untapped, to drive health policy, redesign, quality improvement and
evidence translation in health care. Additionally, it enables...the rigorous
objective evaluation of health policy for government and key policy
professionals; and the ability to compare trends nationally, to identify
programs that deliver value for money and to avoid wasting resources on those
that are not delivering. The result is better targeted, evidence-based and more
cost-effective health policy, services and interventions for the Australian
In addition to the excellent research outlined in paragraphs 2.21–2.22,
the Northern Territory submitted that the following opportunities may be
possible if more Australian Government datasets were accessible:
Geographic distribution of Medicare and PBS [Pharmaceutical
Benefits Scheme] funded service access mapped against state based services or
Socioeconomic distribution of Medicare and PBS funded service
Associations between utilisation of Medicare funded services
and hospital and/or [Emergency Department] services...
The distribution of PBS funded items and measures of health
Quality and safety measures of primary care, by linking
Medicare or PBS items and outcomes such as diabetic control, hospitalisation
The Australian Government also acknowledged the latent potential of data
linkage. For example Department of Health representative Ms Alanna Foster,
First Assistant Secretary told the committee:
Linked data would also enable understanding of the full
extent of patients' health-service usage—that is, it would be possible to
follow patients' pathways through the system and answer questions about patient
populations, such as: are the high users of primary care also high users of the
hospital system? If we provide better access to chronic disease management in
primary care are patients less likely to present to hospital? What interactions
do patients have, with their General Practitioners (GPs), when they leave
With big-data technologies linking and advanced analytic
capabilities, we could, for example, use pattern mining to quickly identify
adverse events that may arise from medical devices or health services, use
cluster analysis to assign patients to like groups—for example, identifying
groups with diabetes or cardiovascular conditions that may be amenable to
policy intervention and then model the impacts of those imperfections, in terms
of costs and patient outcomes. We could use pathways analysis to investigate
how patients—for example, cancer patients—are moving through the health system
and model the impact of policy interventions targeted at improving these
pathways. These are just some of the tools that could be used when forming
government decision making and the work of researchers.
The Australian experience stands in stark contrast to those of other
developed economies that have already liberalised their use of administrative
data. In 2013 the Productivity Commission reported that:
In Denmark, Sweden, Finland and the Netherlands, linked
administrative data are accessible for research purposes. Statistics Finland
considers that statistics should be compiled from administrative records whenever
possible — around 96 per cent of its data come from these sources. This
openness promotes research — ‘microsimulation specialists pour into Nordic
countries because of their liberal approach towards sharing statistics’...
Meanwhile, Australian researchers, frustrated at the relative
inaccessibility of Australian datasets are choosing to use datasets from other
countries. For instance Professor Philip Clarke, Professor of Health Economics
at the University of Melbourne informed the committee:
Other countries have very good datasets. I have done work
with Scandinavian registries in diabetes. They make those available... I am
currently building a cardiovascular health policy model with funding from the
NHMRC [National Health and Medical Research Council], but explicitly in my
application I said I would be using New Zealand data, because there was no
appropriate Australian data. I am able to work with researchers at the
University of Auckland. There are half a million clinical records with
cardiovascular patients that have had their cardiovascular risk assessed. Those
have been linked to hospital records and medical records, and I am able to work
with researchers almost immediately to start analysing that. I would be
dreaming if I thought that could happen in Australia within the next few years.
Australia is missing out on important opportunities to identify health
risks for our own population because Australian Government datasets are
inaccessible. This is particularly the case with pharmaceutical safety. Professor
Sallie-Anne Pearson, Head of the Medicines Policy Research Unit at the Centre
for Big Data Research in Health noted that data inaccessibility has meant that
medicine safety research is not commonly undertaken in Australia:
...fewer than 30 studies have examined drug safety in the last
25 years. This needs to change. Australia is actually well-placed to deeply
understand our return on PBS investment, and also other health programs. The
data already exists. We have information that covers our entire population.
The lack of research is surprising when there are 190 000
hospitalisations caused by medications in Australia every year at a cost of
$660 million to the health care system.
Witnesses told the committee that Australia could safely exploit the existing
PBS data for the benefit of Australians. Dr Barbara Mintzes, Senior Lecturer in
Pharmacy at the University of Sydney informed the committee of the approach of
several other developed countries:
The experience to date in Canada, the US, the UK and Scandinavia
makes it clear that these databases are important tools for medication safety
and protection of public health.
In some cases Australia has been collecting data for years but without fully
utilising the data, its collection is rendered fruitless. As Professor Fiona
My biggest anguish has been that over 30 years of setting up
a birth defects registry to find the next thalidomide, another one could be
happening all the time and we are unable to detect it.
In 2015 the Productivity Commission attempted to articulate why
Australia was falling behind other developed countries in releasing
administrative data. In its Efficiency in Health research paper the
Productivity Commission suggested several reasons including:
- concerns about privacy;
- that processes for accessing administrative data were poorly
structured and did not encourage researchers;
- a lack of transparency about what data government holds; and
- a tendency for data owners to develop costly ad hoc datasets
rather than developing enduring continuous datasets for use by multiple
The Productivity Commission concluded that:
The potential of administrative data is not being realised in
Australia, and the lost opportunities will only grow as technology continues to
open up new ways to use and analyse data. Calls to release and better link
administrative datasets have been made previously by the Commission and by
The evidence heard by the committee and received in submissions suggests
that Australia has significant health data assets and medical research capabilities.
The evidence also clearly demonstrates that in comparison to other countries
Australia is failing to capitalise on its data potential.
The committee recommends that Australia forms partnerships with other
countries engaged in data linking to ensure that Australian data access and
linkage policies and regulations are developed to world's best practice.
As the Productivity Commission and other experts have noted, the factors
that are holding Australia back are largely barriers erected by the legislative
framework or its application by the public service. The blockage is not in
technical expertise or infrastructure. Australia has a world leading data
linkage system and many talented researchers and academics in the field.
Experience and history
Australia's modern data linkage capacity dates back to 1995. Before this
time, some statistics were collected but as Emeritus Professor D'Arcy Holman,
formerly a Professor of Public Health at the University of Western Australia
noted 'what we could do with health statistics...was severely constrained by the
technical infrastructure available to us'.
That changed in 1995 when the Western Australian Data Linkage System (WADLS)
The formation of the WADLS allowed population health researchers to:
...map over 30 pre-existing health databases on the people of
WA. The links mean that the journeys of individuals through the health system
can be followed anonymously over many years and thus their risk factors for
major diseases, and the use and outcomes of health services can be evaluated
using anonymous information.
More information on the change in the use of technology and how
improvements in technology are being used to protect privacy can be found in
At the Australian Government level there is a restriction on who can
perform the data linkage function. The Australian Government requires that only
certain accredited 'integrating authorities' may link Australian Government
data. More information on integrating authorities can be found in Chapter 3.
Each State and Territory either has its own data linkage unit or is
associated with a data linkage unit.
In 2004 the Australian Government established the National Collaborative
Research Infrastructure Strategy (NCRIS). Through NCRIS the government provided
$20 million to establish the PHRN.
The PHRN is a national network that works to support collaboration between data
linkage units and further Australia's linkage potential.
State / Commonwealth divide
Witnesses told the committee that Australia's federal constitution
contributes to its data challenges. As Emeritus Professor Holman noted:
Australia differs from other federations, Canada for example,
in that our [Australian] Government has not directed its financial support for
these integral components of health care through the states, but has
established itself as a separate vertical player.
This State / Commonwealth divide means that the Australian Government
collects primary health and aged care data whilst the States collect hospital,
births, deaths and cancer information. A list of the Australian
Government's major health related data holdings can be found in Appendix 4.
One of the challenges to sharing data between the Australian Government
and the States and Territories has been a reticence by Australian Government
departments to release data based on privacy concerns. Ms Alanna Foster, First
Assistant Secretary of the Department of Health insisted that 'due to the separate
legislative requirements, it can be challenging to link these datasets while
also adhering to strict privacy guidelines'.
One of these privacy guidelines requires that MBS [Medicare Benefits
Schedule] and PBS data cannot be linked and another requires that Australian
Government data linkages must be destroyed at the conclusion of the project.
These two restrictions will be considered in greater detail in Chapters 3 and 4
Despite these restrictions, Professor Clarke told the committee that 'there
have been linkages but they tended to be sporadic'.
However, Emeritus Professor D'Arcy Holman described the period between
2007 and 2012 in Western Australia when 'things were different'. This was
because, as Emeritus Professor Holman recalled:
The two separate information systems [the Australian and
Western Australian] were permitted to talk one with the other.
A short reprieve of different senior administration in the [Australian
Government] led to a collaboration with the State to include the Medicare,
pharmaceutical and aged care data within the WADLS system. This was the first
and only instance since federation that the [Australian Government] and an
Australian State agreed to integrate their data in a functional way to create a
total picture of health system performance.
2.49 In late 2015, government attitudes toward sharing data started to
change. On 3 December 2015, the Department of the Prime Minister and
Cabinet released the Public Sector Data Management Report.
The report sets out a roadmap towards the regular and systematic release
of public sector data and highlights the need to reform certain areas to enable
the Australian Public Service to get the most out of Australia's data holdings.
On 7 December 2015, the Department of the Prime Minister and
Cabinet released the Australian Government Public Data Policy Statement.
The statement declares that Australian Government entities will:
- make high-value data available for use by the public, industry
and academia, in a manner that is enduring and frequently updated using high
- securely share data between Australian Government entities to
improve efficiencies, and inform policy development and decision-making;
- engage openly with the States and Territories to share and
integrate data to inform matters of importance to each jurisdiction and at the
- ensure all new systems support discoverability, interoperability,
data and information accessibility and cost-effective access to facilitate
access to data.
Whilst this was seen as a welcome development, it was a surprise to many
non‑government witnesses who told the committee that they had not been
consulted and were not aware that the government had been working on the policy
statement or the data management report.
When Ms Helen Owens, Assistant Secretary of the Department of the Prime
Minister and Cabinet was asked who the government consulted she listed:
...organisations like Telstra, Google, the World Bank, the
[Australian Broadcasting Corporation], [software producer] IBM, [software
company] SAP. We also spoke with some research institutions—the Grattan
Institute and the Crawford school at ANU. [The Department of the Prime Minister
and Cabinet] then did some individual consultations with business leaders in
the data space and open data space.
The Office of the Australian Information Commissioner was nominally
consulted in the development of both the Public Sector Data Management
Report and the Public Data Policy Statement.
However, the government did not consult the National Health Performance
Authority (NHPA), the National E-Health Transition Authority (NEHTA) or the
Australian Commission on Safety and Quality in Health Care in the development
of either document.
2.55 Turning the report and the statement into a reality will take commitment
and perseverance, something previous governments have promised in this space
but not delivered.
As the Productivity Commission stated in their 2012-13 Annual Report:
Realising these goals [harnessing administrative data to
support research and evidence-based policy evaluation] requires political will,
articulated at the highest levels, to persevere with a concerted strategy with
clear timeframes based on the principle that open access to de-identified
information should be a default position. Realistically, it could take 5-10
years to rollout and embed systems before the ‘holy grail’ of relatively
unimpeded remote access to high quality, de-identified and linked administrative
data is achievable.
While there have been announcements and initiatives in the
past and more recently, the lack of sustained tangible progress means that it
is important that the 5-10 year timeframe does not become a motivation for more
‘false starts’, deferrals or eventual reprioritisation and non-delivery.
International practices and over thirty years of experience in Western
Australia suggest that the capabilities necessary to achieve a more open data
culture could be developed by all Australian governments.
The evidence presented to this committee demonstrates that Australia has
the potential to create a world leading data linkage system that can both
maintain data security and produce ground-breaking public health research.
The committee recognises that linking administrative data, which is
already routinely collected, has the potential to reveal new insights about the
ways Australians use the healthcare system and potential ways to improve the
health outcomes of all Australians.
The opportunities Australia is squandering are not just possibilities
for health improvements for future generations; but the ability to detect
causes of harm to Australians. The committee has received evidence that
Australia could be using its data resources to detect harmful prescription
medications both in children and in adults. Instead, Australian researchers are
forced to rely on studies conducted in other countries where such drug safety
studies are possible. For the benefit of the health of all Australians we can
and must do better.
Improving our data linkage system involves breaking down some of the
historical barriers that have resulted from our federated system of government.
We have seen in sporadic intervals that such cooperation is possible and can
lead to highly beneficial outcomes.
Australia has the infrastructure and the knowledge to make a national data
linkage system work but it will require legislative changes and cultural changes
in the Australian Public Service. The nature of these challenges will be
examined in greater detail in Chapters 3 and 4. These changes could catapult
Australia to become a world leader in data linkage.
The committee welcomes the renewed focus on Australia's data assets and
is encouraged by the attempt to coordinate efforts across government to make
more datasets available. But the committee notes that there is still a long way
to go to overcome many of the barriers currently faced by researchers and the
valid community concerns regarding privacy.
The committee further notes that this is not the first time an
Australian Government has promoted a more open approach to sharing data. The
committee is concerned at the very limited nature of the government's consultation
in developing its recent Australian Public Data Policy Statement and its
Public Sector Data Management report. In compiling its most recent
policies, the government obtained very limited input from key stakeholders,
including those funded by the Australian Government. By failing to consult any
health professionals it became manifestly clear that the use of health data was
not a priority for the government. The committee is concerned by the low regard
in which the government seems to hold health data and the research groups that
work with it.
To ensure that the government's newly articulated approach to releasing
data maximises Australia's big health data potential, while attending to valid
community expectations about security and privacy around personal health data,
the government must broaden its data policy engagement to include health-related
academics, researchers and practitioners.
Navigation: Previous Page | Contents | Next Page