3MEthTaskforce: Multi-source Multi-level Multi-token Ethereum Data Platform
3MEth Dataset Overview
Section 1: Token Transactions
This section provides 303 million transaction records from 3,880 tokens and 35 million users on the Ethereum blockchain. The data is stored in 3,880 CSV files, each representing a specific token. Each transaction includes the following information:
- Sender and receiver wallet addresses: Enables network analysis and user behavior studies.
- Token address: Links transactions to specific tokens for token-specific analysis.
- Transaction value: Reflects the number of tokens transferred, essential for liquidity studies.
- Blockchain timestamp: Captures transaction timing for temporal analysis.
Apart from the large dataset, we also provide a smaller CSV file containing 267,242 transaction records from 29,164 wallet addresses. This smaller dataset involves a total of 1,194 tokens, covering the time period September 2016 to November 2023. This detailed transaction data is critical for studying user behavior, liquidity patterns, and tasks such as link prediction and fraud detection.
Section 2: Token Information
This section offers metadata for 3,880 tokens, stored in corresponding CSV files. Each file contains:
- Timestamp: Marks the time of data update.
- Token price: Useful for price prediction and volatility studies.
- Market capitalization: Reflects the token's market size and dominance.
- 24-hour trading volume: Indicates liquidity and trading activity.
Section 3: Global Market Indices
This section provides macro-level data to contextualize token transactions, stored in separate CSV files. Key indicators include:
- Bitcoin dominance: Tracks Bitcoin's share of the cryptocurrency market.
- Total market capitalization: Measures the overall market's value, with breakdowns by token type.
- Stablecoin market capitalization: Highlights stablecoin liquidity and stability.
- 24-hour trading volume: A key measure of market activity.
These indices are essential for integrating global market trends into predictive models for volatility and risk-adjusted returns.
Section 4: Textual Indices
This section contains sentiment data from Reddit's Ethereum community, covering 7,800 top posts from 2014 to 2024. Each post includes:
- Post score (net upvotes): Reflects engagement and sentiment strength.
- Timestamp: Aligns sentiment with price movements.
- Number of comments: Gauges sentiment intensity.
- Sentiment indices: Sentiment scores computed using methods detailed in the data preprocessing section.
The full Reddit textual dataset is available upon request; please contact us for access. Alternatively our open-source repository includes a tool to guide users in collecting Reddit data. Researchers are encouraged to apply for a Reddit API Key and adhere to Reddit's policies.
This data is valuable for understanding social dynamics in the market and enhancing sentiment analysis models that can explain market movements and improve behavioral predictions.