In hydrology, as in many scientific fields, reading and writing various types of data files is a fundamental task. Common file formats include binary files, raster data, CSV tables, and NetCDF datasets. Efficient handling of these formats is essential for performing large-scale or high-resolution analyses.
Before diving into file operations, it’s important to understand indexing in Julia, especially because many data formats (e.g., raster or NetCDF) are structured as multidimensional arrays.
In Julia, arrays are one of the most commonly used data structures. Julia uses 1-based indexing, which is consistent with MATLAB and Fortran but different from Python or C, which start from 0. More importantly, Julia stores arrays in column-major order, which means that elements in a column are stored in contiguous memory locations.
This is particularly relevant when working with geospatial raster or multidimensional scientific datasets, as the order of reading and writing data (e.g., row vs column-wise) can affect both performance and accuracy (see Chapter 3 for details).
Understanding indexing helps when:
Tips:
Binary files store raw data and are commonly used in hydrological models or satellite datasets for efficient storage. In Julia, you can read binary files using built-in functions such as open(), read(), and reinterpret().
# Example: Reading binary data
filename = "elevation_data.bin"
fid = open(filename, "r")
data = read(fid, Float32, (1000, 1000)) # Read 1000x1000 float32 grid
close(fid)
Raster files (e.g., GeoTIFF) are widely used in hydrology for storing gridded data like elevation, rainfall, or land use. Julia provides support for raster files through packages like GeoArrays.jl and ArchGDAL.jl.
Raster files typically include metadata such as:
Understanding and preserving this metadata is critical for correct spatial analysis and visualization.
using GeoArrays
dataset = GeoArrays.read("dem.tif") # Reading Tiff file
array = dataset.A[:,:] # reading as a matirx
CSV (Comma-Separated Values) is a simple text format for tabular data, commonly used for time series, station metadata, or hydrological measurements.
Use CSV.jl and DataFrames.jl for fast and flexible parsing:
using CSV, DataFrames
df = CSV.read("streamflow.csv", DataFrame)
first(df, 5)
To write data
CSV.write("output.csv", df)
Tips:
NetCDF (Network Common Data Form) is a widely-used format for storing multidimensional scientific data, commonly used in climate models, reanalysis, and remote sensing.
Use the NCDatasets.jl package:
using NCDatasets
ds = NCDataset("precipitation.nc", "r")
precip = ds["pr"][:] # Read entire variable
time = ds["time"][:]
close(ds)
Why NetCDF is powerful:
Best practices:
Always document: