Dash logoDataONE logo

Data design thinking: data cleaning improvements using tableau prep

Citation

Felker, Christopher (2018), Data design thinking: data cleaning improvements using tableau prep, v6, DataONE Dash, Dataset, https://doi.org/10.15146/R3R68G

Abstract

Tableau Prep automatically shows errors and outliers in data and employs fuzzy clustering to help you with the common, repetitive tasks like fixing spellings errors or reconciling entities across data sources. Project Maestro shares the same calc language and governance structure as the rest of Tableau, so you can get started easily. And with a streamlined sharing experience to Tableau Desktop, Tableau Server and Tableau Online, a user can experience data prep and analysis in one continuous flow. 

In this first phase of the Enterprise implementation of Tableau at UCSD Health, we have focused on getting data from five vendors (Epic, Experian and Bank of America / Healthlogic) ready for Tableau work. The data sets we've created in Hyper have a zero date of 2013 10 01. They are also made to conform to  SDMX standards whenever possible. SDMX stands for Statistical Data and Metadata eXchange and is an international initiative that aims at standardising and modernising (“industrialising”) the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.

Current operations depend on a data file loosely termed an 'accounts trial balance'. This data is actually a blend of account level information, and a 'bucket' which is a software generated construct. The bucket information relies on affinity analysis - a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In the case of UCSD Health billing accounts, the affinity is based on the patient, their insurance coverges and the split of fiscal responsibility.

SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.

These organisations are the main players at world and regional levels in the collection of official statistics in a large variety of domains (agriculture statistics, health care, economic and financial statistics, social statistics, environment statistics etc.).

We've published a sequence of Adobe Spark guides that can be useful for providing UCSD employees recognition for learning that happens anywhere. A digital badge is an online representation of a data analysis skill we are promoting and sharing with our sister UC medical centers.

References

Tableau 2018. Tableau Prep. <https://tabsoft.co/2vbfBG1> last accessed 2018 04 24.

SDMX 2018. Learning about sdmx basics. <http://bit.ly/2qvze5J> last accessed 2018 04 13.

Methods

The operational data has been defined in data structure definition specifications.

Reference copies that have been anonymised have been provided here and in Adobe Spark publications.

Usage Notes

dsd/043 dimension sdmx data structure definition exposure type

dsd/045 dimension sdmx data structure definition valuation method

universal resource locator url
<http://bit.ly/2wFtGw8>

dataset
<CBD2 – consolidated banking data>

data structure definition
<statistics on consolidated banking data>

ECB_CBD2 agency
<ECB> 

download SDMX 2.1 schema of the ECB_CBD2 DSD
<http://bit.ly/2ImA7p3>

uc health / ucsd health dataset
<CCD1 - consolidated claim data>

data structure definition(s)
<dsd/047 attribute proprietary data structure definition claim 05280602 epic systems>

UCH_CCD1 agency
<0000 0001 2107 4242 ucsd health>

access to CCD1 is through the ucsd tableau server

Metrics based on this standard are developed by persons listed in this resource

d/416 2018 19 131 master organisation chart ucsd health patient financial services 0000 0001 2107 4242 ucsd health

Discovery metrics

Beta metrics

CCD Bm 0.0

Alpha metrics

CCD Am 0.0

Production metrics

CCD 0.0

Location

San Diego, CA 92121, USA