Data Handbook

Dealing with data workflow

To deal with data effectively, I recommend creating a workflow. Dealing with data includes clear and rational thinking. Because of that, it helps to apply a clear and rational process to facilitate your thinking. This handbook serves as a support for that process. Use this Dealing with data workflow together with the Data Handbook as recipe for building your dealing with data project.

Added in brackets are the references to where in the Data Handbook you can find more information on a step in the workflow.

1. Do Research

    1.1 What does your data look like? (Take a look at your own dataset).

    1.2 What is the smallest element in your dataset? (Reconstruct to deconstruct mindset, page 20).

    1.3 What tool should you use? (Pros and cons sections for each tool, i.e., for Excel page 25).

    1.4 Do I know how this tool works? (How to learn, how to install and run, i.e., Excel page 31).

    1.5 Do I know the Basics of the tool? (Basics section, i.e., Excel page 33).

2. Write Out or Draw the Large Steps (Building Blocks) of Your Data Problem

    2.1 Think of what information you need from your source data.

    2.2 Think of what data Essentials you are going to use (Essentials section, i.e., Excel page 39).

3. Add All Data Together

    3.1 Add together all data required from step 1 in a base dataset (Deconstruct to Reconstruct mindset, see page 20).

    3.2 Convert the complete dataset to long format (Essentials to long section, i.e., Excel page 55).

4. Code One Building Block

    4.1 Code a building block from step 2 (Basics, Essentials and Extras sections, i.e., Excel pages 33, 39, 61).

    4.2 Apply the code and test if the output matches your expectation.

    4.3 Google or ask ChatGPT for answers in case the output doesn’t match the expectation (General – 7.1 Search and 7.2 ChatGPT).

    4.4 Update your understanding of the problem and the code.

    4.5 If possible, include a “test” in your code to ensure the output is exactly what you expect in the future (Extras assertions sections).

    4.6 Only after you have the output you expect and your knowledge is up to date move to the next step.

5. Code the Next Building Block

    5.1 Repeat step 4 for the next building block.

    5.2 Keep repeating until the full data pipeline is complete by stacking all the building blocks.

6. After All Building Blocks Are Developed, Finish Up the Code

    6.1 Look at the final output and create sanity checks to determine if the output is exactly as expected.

    6.2 Clean up the code to match the style guide for proper formatting (Style guide, i.e., Excel page 59).

    6.3 In case applicable, create a visualization of the output (General – 7.3 Visualizations, page 305).