-
EnviroStream
This repository contains datasets, queries and a generator for the EnviroStream, a benchmark for Stream Reasoning (SR) systems. SR focuses on applying inference to dynamic... -
High Performance and Scalable Analytics Module
Mining with big data or big data mining has become an active research area. Running current analytical methodologies and software tools on a single personal computer cannot...-
PDF
The resource: 'Introduction to Parallel ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Introduction to Hadoop' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Hadoop Patterns' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Remote Connection and HDFS' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'Exercises for Remote ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Introduction to Spark' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Introduction to Spark SQL' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Hadoop Ecosystem and ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Data Mining with Spark (MLLIB)' is not accessible as guest user. You must login to access it!
-
ZIP
The resource: 'Exercises for Data Mining ...' is not accessible as guest user. You must login to access it!
-
PDF
-
Interactive Learning Environments
King’s College London developed a variety of data science materials based on R and Python. R is a de facto standard in statistical computing and visualisation, while our... -
Business Data Analytics Course
The training material provided by the University of Tartu is about the Business Data Analytics course. This course is meant for students as a hands-on experience for solving...-
PDF
The resource: 'Introduction' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Visualisation' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Customer Segmentation' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Customer Lifecycle Management' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Customer Lifecycle ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'A \ B Testing' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Cross Selling and Upselling' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Process Mining' is not accessible as guest user. You must login to access it!
-
PDF
-
Data Management for Business Intelligence Module
This module provides an introduction to information storage and management performed in order to support business decisions of organizations. It is part of the Master in Big...-
PDF
The resource: 'Introduction to Data ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Data Analysis using ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'Extract, Transform and ...' is not accessible as guest user. You must login to access it!
-
PDF
The resource: 'On-Line Analytical ...' is not accessible as guest user. You must login to access it!
-
PDF
-
Gene Disease Association Data and Features
This dataset contains data that can be used for disease gene discovery purposes. The data cover ten different diseases with associated seed genes (derived from DisGeNET) and...-
RAR
The resource: 'Gene_Disease_Association_Da ...' is not accessible as guest user. You must login to access it!
-
RAR
-
The Hackernews dataset
This corpus has been extracted from The Hacker News website (https://thehackernews.com), a CS news platform that attracts over 8 million readers monthly, which is daily... -
-
CSV
The resource: 'World Trade Web_2000' is not accessible as guest user. You must login to access it!
-
CSV
-
Carbon Trade Network_2000
Weighted, directed adjacency matrix of the Carbon Trade Network in the year 2000-
CSV
The resource: 'CTN_adj_2000' is not accessible as guest user. You must login to access it!
-
CSV
-
Carbon Trade Network_2020
Weighted, directed adjacency matrix of the Carbon Trade Network in the year 2020-
CSV
The resource: 'CTN_adj_2020' is not accessible as guest user. You must login to access it!
-
CSV
-
Private Identified CNVs from whole exome sequencing data of BRCA1/2 negative breast c...
This dataset offers a comprehensive analysis of Copy Number Variations (CNVs) identified in Whole Exome Sequencing (WES) data from patients with breast cancer who tested... -
-
CSV
The resource: 'WTN_adj_2020' is not accessible as guest user. You must login to access it!
-
CSV
-
Synthetic Dataset for Causal Analysis
The dataset is a synthetic version of the well-known German Credit dataset (https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data). It includes variables such as...-
CSV
The resource: 'synthetic german data' is not accessible as guest user. You must login to access it!
-
CSV
-
SWH Filenames
A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...-
ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
-
ZIP
-
DeLag: Microservices execution traces
The dataset contains execution traces collected from the well-know open-source microservices system Train-ticket. The traces are generated over a variety of scenario,...-
parquet
The resource: 'Unnamed resource' is not accessible as guest user. You must login to access it!
-
parquet
-
Physical activity, quality of sleep, and quality of life in Italy: the long t...
From March 2020 to May 2021, several lockdown periods caused by COVID-19 pandemic have limited, with varying degrees of severity, the people’s usual activities and mobility in...-
ZIP
The resource: 'dataset and code' is not accessible as guest user. You must login to access it!
-
ZIP
-
Thyroid-cancer patients
The data used originate from the web-based database of the Italian Thyroid Cancer Observatory (ITCO), opened in 2013 at the Thyroid Cancer Center of the Sapienza University of... -
Scientific Publications Dataset
This is the sciMAG2015 dataset, i.e., the open dataset linking Microsoft Academic Graph and sciMAGO's journal classification for bibliometrics studies. It includes publication...-
Data
The resource: 'sciMAG2015 - Data' is not accessible as guest user. You must login to access it!
-
Data
-
Soccer Team Performance
The dataset contains the performance features (passes, shots, goals, tackles, ecc) of soccer teams during the games of six major European leagues in three seasons. The dataset... -
Formal network of Estonian companies and board members
This dataset consists of managed and continuously updated data about Estonian companies and board members since 1994. Technical documentation of data structures and the REST API...
