The University of Auckland
Browse

3MEthTaskforce: Multi-source Multi-level Multi-token Ethereum Data Platform

Version 2 2025-01-15, 14:02
Version 1 2025-01-15, 03:02
dataset
posted on 2025-01-15, 14:02 authored by Haoyuan LiHaoyuan Li, Mengxiao ZhangMengxiao Zhang, Maoyuan Li, Jianzheng LiJianzheng Li, Shuangyan DengShuangyan Deng, Zijian Zhang, Jiamou LiuJiamou Liu

3MEth Dataset Overview

Section 1: Token Transactions

This section provides 303 million transaction records from 3,880 tokens and 35 million users on the Ethereum blockchain. The data is stored in 3,880 CSV files, each representing a specific token. Each transaction includes the following information:

  • Sender and receiver wallet addresses: Enables network analysis and user behavior studies.
  • Token address: Links transactions to specific tokens for token-specific analysis.
  • Transaction value: Reflects the number of tokens transferred, essential for liquidity studies.
  • Blockchain timestamp: Captures transaction timing for temporal analysis.

Apart from the large dataset, we also provide a smaller CSV file containing 267,242 transaction records from 29,164 wallet addresses. This smaller dataset involves a total of 1,194 tokens, covering the time period September 2016 to November 2023. This detailed transaction data is critical for studying user behavior, liquidity patterns, and tasks such as link prediction and fraud detection.

Section 2: Token Information

This section offers metadata for 3,880 tokens, stored in corresponding CSV files. Each file contains:

  • Timestamp: Marks the time of data update.
  • Token price: Useful for price prediction and volatility studies.
  • Market capitalization: Reflects the token's market size and dominance.
  • 24-hour trading volume: Indicates liquidity and trading activity.

Section 3: Global Market Indices

This section provides macro-level data to contextualize token transactions, stored in separate CSV files. Key indicators include:

  • Bitcoin dominance: Tracks Bitcoin's share of the cryptocurrency market.
  • Total market capitalization: Measures the overall market's value, with breakdowns by token type.
  • Stablecoin market capitalization: Highlights stablecoin liquidity and stability.
  • 24-hour trading volume: A key measure of market activity.

These indices are essential for integrating global market trends into predictive models for volatility and risk-adjusted returns.

Section 4: Textual Indices

This section contains sentiment data from Reddit's Ethereum community, covering 7,800 top posts from 2014 to 2024. Each post includes:

  • Post score (net upvotes): Reflects engagement and sentiment strength.
  • Timestamp: Aligns sentiment with price movements.
  • Number of comments: Gauges sentiment intensity.
  • Sentiment indices: Sentiment scores computed using methods detailed in the data preprocessing section.

The full Reddit textual dataset is available upon request; please contact us for access. Alternatively our open-source repository includes a tool to guide users in collecting Reddit data. Researchers are encouraged to apply for a Reddit API Key and adhere to Reddit's policies.


This data is valuable for understanding social dynamics in the market and enhancing sentiment analysis models that can explain market movements and improve behavioral predictions.


History

Publisher

University of Auckland