Parsing Generic Csv With Beanio

BeanIO provides a simple way to read a csv file whose columns and column datatypes are known at compile time. This may not always be the case at least for some columns.

Use cases

Think of an application for processing data sets. Some columns might be fixed: id, timestamp, data always collected. Others depend on which sensory data is collected. Or consider an import-export function for customizable database application — where users can configure additional fields extending a predefined set.

Approach

A simple approach would be storing each line as simple key-value-pairs, mapping column name to value. However an object with typed fields for already known columns (e.g. timestamps) would be much nicer. Validation and datatype conversion for known columns could be done by the framework during import and only new columns (unknown at compile time) would have to be interpreted later.

In BeanIO this is possible using typehandlers. The basic idea is to have a class with typed fields for all known columns and a field with a map-like type for the remaining unknown columns, the type handler is then implemented such that it merges or extracts values properly.

Sample project

This project contains an example illustrating this approach. CsvLine stores the data of a single line, GenericCsvEntries stores the values of unknown columns from one line. CsvLineFormat helps creating readers and writers for the csv files. Depending on the column name custom type handlers are added. Each instance of GenericCsvEntriesHandler is responsible for a particular column.

The two test classes contain example code for reading and writing a csv file.

In the example only string values are stored. If the colums including their types are customizable, the GenericCsvEntriesHandler could be enhanced to produce typed values.