HOME




Cryptocurrency Price Data Collection

Comprehensive Market Data Harvesting: 800,000+ Data Points Across 5 Exchanges





Cryptocurrency Price Data Collection

1 Public Exchange API Integration


Cryptocurrency markets operate through specialized trading platforms called exchanges, which function as digital marketplaces for buying and selling digital assets. Unlike traditional financial institutions, these exchanges operate 24/7 with prices fluctuating continuously based on real-time supply and demand dynamics, often updating every few seconds.


Modern cryptocurrency exchanges provide public APIs (Application Programming Interfaces) that enable programmatic access to real-time market data. For instance, to retrieve the current Bitcoin price from the Kraken exchange, you can access the following endpoint directly in your browser:

https://api.kraken.com/0/public/Ticker?pair=BTCEUR

This API call returns comprehensive market data including three critical price points:

  • last: The price of the most recent completed transaction
  • bid: The highest price buyers are willing to pay (sell price for holders)
  • ask: The lowest price sellers are willing to accept (buy price for purchasers)

2 Programmatic Data Collection


The real power of public APIs lies in their ability to be accessed programmatically, enabling automated data collection at scale. This capability is essential for systematic market analysis and algorithmic trading strategies.

Below is a practical example using the R programming language to retrieve Bitcoin market data from Kraken and format it into a structured table:

# Load required packages
library(tidyverse)

# Establish connection and retrieve market data
library(RCurl)
api_endpoint <- "https://api.kraken.com/0/public/Ticker?pair=BTCEUR"
raw_data <- getURLContent(api_endpoint)

# Parse JSON response and structure data
require(jsonlite)
parsed_data <- fromJSON(raw_data)$result[[1]]
market_data <- data.frame(
  ask = parsed_data$a[1], 
  bid = parsed_data$b[1], 
  last = parsed_data$c[1], 
  open = parsed_data$o, 
  low = parsed_data$l[1], 
  high = parsed_data$h[1], 
  volume = parsed_data$v[1], 
  volumeQuote = NA, 
  timestamp = NA
)

# Display results
ask bid last open low high volume
95753 95752 95743 96137 94375 96209 90

3 Comprehensive Function Library


To facilitate systematic data collection across multiple exchanges, I have developed a comprehensive function library that standardizes API interactions across five major cryptocurrency exchanges. This library provides unified access to market data regardless of the underlying exchange’s API structure.

Usage Example:

# Load the standardized API functions from GitHub repository
source("https://raw.githubusercontent.com/ymju86/Crypto-Arbitrage/master/FUNCTIONS/Public_Market_Functions.R")

# Retrieve Bitcoin price from Bitstamp exchange
get_bitstamp(Sys.time(), "BTCEUR")

4 Large-Scale Data Harvesting Operation


This research project involved an extensive data collection campaign with the following specifications:

  • Duration: 14-day continuous monitoring period (February 5-19)
  • Cryptocurrencies: 5 major digital assets - Bitcoin (BTC), Bitcoin Cash (BCH), Ethereum (ETH), Litecoin (LTC), and Ripple (XRP)
  • Exchanges: 5 leading platforms - Coinbase, Kraken, Bitstamp, Bitfinex, and CEX.io
  • Data Points: Over 800,000 individual price observations
  • Frequency: Real-time data collection every few seconds

The data harvesting was implemented using an automated collection system that continuously called the API functions described above. The complete implementation script is available here.


Data Access: The complete dataset is publicly available on GitHub in compressed format. You can load and analyze the data using the following R code:

# Load the comprehensive market dataset
load("../DATA/public_ticker_harvest.Rdata")  

# Examine the dataset structure
head(Ticker)


Data Visualization Preview: As a demonstration of the dataset’s richness, here’s the Ethereum price evolution on Bitstamp during the collection period:

# Load the comprehensive dataset
load("../DATA/public_ticker_harvest.Rdata")

# Create price evolution visualization
library(hrbrthemes)
Ticker %>%
  filter( symbol == "ETHEUR" ) %>%
  filter(platform == "Bitstamp") %>%
  ggplot( aes(x=time, y=as.numeric(last))) +
    geom_ribbon(aes(ymin=450, ymax=as.numeric(last)),  fill="#69b3a2", color="transparent", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ggtitle("Ethereum Price Evolution During Data Collection Period") +
    ylab("Ethereum Price (EUR)") +
    theme_ipsum() +
    theme(
      plot.title = element_text(size=12)
    )

5 Next Phase: Market Analysis


With this comprehensive dataset in hand, the next phase involves quantitative analysis of price differences between exchanges. If these differences prove significant enough, they may present viable arbitrage opportunities.


Analyze Price Differences →