Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 2.88 KB

README.md

File metadata and controls

22 lines (13 loc) · 2.88 KB

MetaReader

http://jannah.github.io/MetaReader/

Read the full paper

Exploratory Data Analysis (EDA) of any given dataset can be an interesting experience. It can also be frustrating especially if the analyst is unfamiliar with the dataset. Such frustrations arise from issues related to data quality, arrangement, missing values, or unclear labeling. Some datasets come with some form of documentation, such as README files, that explains their contents. Sometimes these too are not sufficient. Another source of frustration is ambiguity about the expected outcomes from an EDA activity. Questions and hypotheses can be valuable guides when starting an EDA activity. However, there are situations where analysts would be given a task without any such guidelines and would be asked to “find something interesting”.

Analysts approach these issues in different ways. Some jump directly into visualizing the dataset using a visual analytics tool. Others prefer to open the dataset in a text or spreadsheet viewer. They would scan through it to learn about the data and formulate questions or hypothesis before moving on to visual or statistical analysis.

Another challenge with EDA is dataset documentation. Having clear, robust documentation can be highly beneficial for analyst but is a cumbersome task for data creators who want to release datasets to analysts.

MetaReader is a meta-exploration and documentation tool for datasets. It is designed to facilitate three tasks associated with EDA: learning, documentation, and sharing.

Learning

MetaReader helps analysts jump start their EDA activity by providing simple visualizations, statistics, and valuable insights about the dataset. These insights could help users learn about the contents of the datasets, such as the data types of each column and shape (or distribution) of the values. They also help address potential data quality issues such as missing or mixed type values. Finally, these insights could help analysts formulate questions and hypotheses for their EDA activity.

Documentation

For data creators, documenting the content of a dataset would add great value to the dataset and benefit analysts. For analysts, documenting their initial thoughts, questions, and hypotheses about the dataset would help them keep track of their progress and also retain a record of their work. MetaReader facilitates documentation by providing its users with three distinct free text entry fields for each column in the dataset: descriptions, notes, and questions.

Sharing

MetaReader provides several ways to save all the information generated by the tool along with the information they entered. For analysts, they can share their thoughts, questions and answers with others. For data creators, it can be an easy way to share good documentation (e.g. README files) of their datasets.