-
Cross-Lingual Dataset of Crisis-Related Social Media
If you use this dataset, please cite the following paper: Fedor Vitiugin, Carlos Castillo: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive... -
DBLP Network
The DBLP computer science bibliography provides a comprehensive list of research papers in computer science. This dataset is a co-authorship network constructed upon the DBLP...-
HTML
The resource: 'DBLP Network' is not accessible as guest user. You must login to access it!
-
HTML
-
The Italian Music Dataset
The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...-
JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!
-
JSON
-
Ukraine-related Disinformation Dataset
Ukraine-related disinformation dataset from "Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation" (accepted at SocInfo... -
GERDAQ Dataset
This is a benchmark dataset of annotated search-engine queries. Mentions of entities in search-engine queries are tagged with the entity they refer to. Wikipedia is used as...-
XML
The resource: 'GERDAQ dataset' is not accessible as guest user. You must login to access it!
-
XML
-
Facebook - New Orleans regional network
This dataset contains information about 90,269 users and 3,646,662 friendship links between those users. These users belong to the New Orleans Facebook regional network. The...-
HTML
The resource: 'New Orleans Facebook dataset' is not accessible as guest user. You must login to access it!
-
HTML
-
German Academic Web
The dataset contains regular crawls of the websites for German academic institutions. -
MSN Search query log
The data consists of an MSN Search query log excerpt with 15 million queries, from US users, sampled over one month of activity. Data attributes made available per query: 1)... -
A dataset of gamers on Twitter
This gaming-related dataset consists of 8932 users (labeled as gamers) engaging in game-related conversations. We have collected (June 2018) their timeline (the most recent 3200... -
Product Reviews for Ordinal Quantification
This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification... -
Wikipedia Word Embeddings
Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0... -
Cherenkov Telescope Data for Ordinal Quantification
This labeled data set is targeted at ordinal quantification. It appears in our research paper "Ordinal Quantification Through Regularization", which we have published at... -
Learning to quantify: LeQua 2022 datasets
The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual... -
Wikinews dataset
This dataset consists of a sample of 365 news published by Wikinews from November 2004 to June 2014 and annotated with about 5000 entities, each associated with a saliency...-
JSON
The resource: 'entity-saliency' is not accessible as guest user. You must login to access it!
-
JSON
-
UCR Time Series Classification Archive
The archive contains many interesting datasets, including a gesture dataset featuring the same two actors, but recorded 16 years apart! (we are nothing if not patient!). The...
