Data design thinking: data cleaning improvements in the beta testing of project maestro
Felker, Christopher (2018), Data design thinking: data cleaning improvements in the beta testing of project maestro, v2, DataONE Dash, Dataset, https://doi.org/10.15146/R3R68G
Project Maestro automatically shows errors and outliers in data and employs fuzzy clustering to help you with the common, repetitive tasks like fixing spellings errors or reconciling entities across data sources. Project Maestro shares the same calc language and governance structure as the rest of Tableau, so you can get started easily. And with a streamlined sharing experience to Tableau Desktop, Tableau Server and Tableau Online, a user can experience data prep and analysis in one continuous flow.
In this first phase of the Enterprise implementation of Tableau at UCSD Health, we have focused on getting data from three vendors (Epic, Experian and Bank of America) ready for Tableau work. The data sets we've created in Hyper have a zero date of 2013 10 01. They are also made to conform to SDMX standards whenever possible. SDMX stands for Statistical Data and Metadata eXchange and is an international initiative that aims at standardising and modernising (“industrialising”) the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.
SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.
These organisations are the main players at world and regional levels in the collection of official statistics in a large variety of domains (agriculture statistics, health care, economic and financial statistics, social statistics, environment statistics etc.).
We've published a sequence of Adobe Spark guides that can be useful for providing UCSD employees recognition for learning that happens anywhere. A digital badge is an online representation of a data analysis skill we are promoting and sharing with our sister UC medical centers.
Tableau 2018. Project maestro. <https://tabsoft.co/2vbfBG1> last accessed 2018 04 13.
SDMX 2018. Learning about sdmx basics. <http://bit.ly/2qvze5J> last accessed 2018 04 13.
The operational data has been dedined in data structure definition specifications. Reference copies that have been anonymised have been provided here and in the Adobe Spark publications.