Data Processing and Analysis

OpenRefine

Notepad++

WinMerge

Data organisation in spreadsheets

Introduction to data cleaning with OpenRefine

Exercises for the Introduction to data cleaning

  1. Import the .csv file and create a project
  2. Change the case in a column and transform the value type to date in the column with dates. To do this, go to Edit cells > Common transforms. 
  3. Try out text and timeline facets and the text filter. 
  4. Split a column into two or more columns. 

Exercise 1

  1. Import the Wallonia dataset. 
  2. Use the green checkmark when you're done, under Reactions in Zoom.

Exercise 2

  1. Find values in the Label column containing the text "farm".
  2. What is the most common value in the column Province for these?

Exercise 3

  1. Split the geopoint column.
  2. Rename the columns.

Exercise 4

  1. Split Label with ( into max 2. 
  2. Split multi-valued cells in Label/Label 1 with -
  3. Trim whitespaces and change to lowercase. 
  4. Join multi-valued cells.

Exercise 5

  1. Replace the text in History to leave only the years. 
  2. Split multi-valued cells.
  3. Transform to date.
  4. Try out the timeline facet.