01 COVID V1 sschema20

COVID–19 Data Set Modeling and Analytics

The Starschema COVID–19 Data Set

The Starschema COVID–19 Data Set collates a range of important resources for assessing the impact, severity, and response to the COVID–19 pandemic. The data is stored in AWS S3 as flat files and through the Snowflake Data Marketplace as a shareable data source for ease of access and is available free-of-charge. Detailed information about the content of the data set is available on the project’s Github repository. The METADATA table in the Snowflake Data Marketplace share contains detailed column-level information about the tables that comprise the share, as well as comments from data originators that help users make sense of the data.

As a wide range of organizations, from NGOs and governments to public health authorities and enterprises, struggle to adapt to this new world, the data provided in the Starschema COVID-19 Data Set can provide accurate, up-to-date intelligence to support real-time, data-driven decision-making. This single source of truth is “analytics-ready, and integrates with other data sources so you can analyze the progression of the COVID-19 pandemic over time, in any context. By aligning the data along widely used identifiers (e.g. ISO 3166 geographies), data from disparate sources can be unified easily and users are spared the work of reconciling the range of data sources that often use different identifiers or definitions.

While we included the most reliable and trustworthy sources, all data is not created equal and this is particularly true when looking at the data reported by countries and states that have unequal access to resources, assign different definitions to the same metric and whose governments exert influence to reflect well on their political agendas. This means it takes work to analyze the data and build models that you can have confidence in, and provide real analytical value to make better decisions.

Data modeling

New relevant and reliable data sources will be added to this data set as they come available and it will be constantly updated and revised. The pandemic is a moving target and will remain so for the foreseeable future. A good model can help identify trends, alert us when things are changing, show us how fast they are changing, and do so at various levels of detail.

These models — and the visualizations and dashboards they power — can be particularly helpful in evaluating

  • supply chain dynamics
  • demand planning
  • HR and location vulnerabilities
  • financial impacts

By integrating the Starschema COVID–19 Data Set with other related data — both internal and external — executives and managers can better understand the impacts at a deeper level and make business-critical data-driven decisions based on answers to newly relevant questions:

  • What geographic areas are affected, how badly, when will they begin to normalize, and how quickly?
  • What areas are at risk?
  • What areas are threatened by a possible recurrence?
  • How are government policies affecting each area and what effect will potential future policy changes have on the business?
  • Who in your organization is at risk and how does this risk affect the capabilities of the organization?
  • Who can be reassigned to ensure the most important functions and projects aren’t impacted?
  • How is working from home impacting projects — good and bad?
  • Which projects are at risk now and which are likely to be in the future?
  • How does working from home impact operational costs?
  • What supply chains, distribution centers, and customer channels are at risk now, and which are likely to become at risk in the future?

The Starschema COVID–19 Data Set Modeling and Analytics Solution

Through the work of collating, curating, and unifying the data we developed a nuanced understanding of the data, its biases and how to best work with it to gain meaningful insights. Our solution teams are led by a senior data scientist with long-standing expertise in clinical epidemiology and the analysis of viral outbreaks.

Key Benefits

01 covid19 sschema20 performance factors 01
Greater visibility of the factors affecting your business
02 key benefits icons antares solbrief sschema20 performance performance
Accelerated decision-making process
01 icons 02
Ability to react rapidly to changing events

Key features

Snowflake access with 300 credits

The Snowflake Data Marketplace, a secure, fully-governed platform for sharing and exchanging data, allows Starschema to easily and seamlessly share data on COVID–19 in near real-time. Public and private sector organizations can connect to the Data Marketplace from within their Snowflake account for seamless integration of the COVID–19 incidence data set and fast query processing.

Access to Starschema COVID–19 Data Set via S3

For use cases that do not require a data warehouse, the data sets are available as flat CSV files via Amazon’s S3 storage service at fixed endpoints. This allows bulk downloads and easy utilization in the customer’s tool of choice.

Data integrations with internal data sources

Integration with your key source system data with the COVID-19 data set, provide context about the data set through a consultation.

Data modeling

Leveraging your key source system data, Starschema’s data science team can build models to answer the questions needed to make crucial decisions.

Data visualization and dashboard creation

Real-time interactive visualizations and dashboards built on Tableau, Mapbox, Plotly, PowerBI and other tools reveal key data as it changes to facilitate quick, data-driven decision-making.

The Starschema difference

Experience in large environments

Fortune 100 companies trust Starschema to keep their data pipelines robust, resilient, and reliable. Our experts have been trusted by Fortune 500 companies, governmental organizations and NGOs to visualize operations-critical data in a clear and accessible manner.

Complete data lifecycle management

From ingestion to consumption, our teams of database administrators, data engineers, ETL developers, application developers, and data visualization experts provide a seamless solution for your complete data pipeline.

Flexible service models

Starschema offers platform design and management and DataOps for entire, multi-vendor data pipelines or specific components.

Proven onboarding methodology

With standard processes for deployment, knowledge transfer, and integration with ticketing systems, Starschema ensures faster time to value.

Tools-based approach

Starschema deploys open source and proprietary frameworks, methodologies and tools to provide effective, accurate, and repeatable solutions and services.

Ask the Expert

Chris von Csefalvay

Kristof Csefalvay is Starschema's VP for Special Projects, having previously served as Principal Data Scientist at Starschema. As a data scientist with over 10 years' experience, he has pioneered AI approaches in epidemiology, earth observation, and digital signal processing. Educated at Oxford and Cardiff, he has worked in data science roles for companies across Europe and the Americas and holds a number of patents in the field of machine learning, AI, and DSP.

Csefalvay Kristof201809121500180200
Working with the COVID-19 Data Set

During this time of crisis, everyone is searching for answers. Governments, healthcare institutions, non-governmental organizations, and businesses large and small urgently need to make decisions about their future. We believe they should be armed with accurate, easily accessible, analytics-ready data. That’s why we collated, curated, and unified the most credible and reliable public data sets into a single source of truth data set.

COVID-19 Case Count Trajectory Starter Dashboard

Everyone wants to know when America could once again open up for business. This publicly available interactive data visualization indicates the trajectory of cases, the trajectory of positive cases as a percent of total cases and allows citizens, enterprises and NGOs a view of progress towards meeting the quantitative gating criteria in every state.

COVID–19 Data Set Modeling and Analytics

During times of crisis, companies must look at the available data — both internal and external— and try to understand how that data can be used to determine how the business is currently being impacted, how it is likely to be affected in the future, what are most likely scenarios that will play out, what can be done to counter those scenarios and take advantage of hidden opportunities in this rapidly changing environment. The Starschema COVID-19 dataset ingests reliable data from multiple sources and makes it analytics-ready so it can be easily accessed and used.