135 items found

Organisations: SoBigData Catalogue

Filter Results
  • Dataset

    Social Network dataset - LiveJournal

    LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We...
    • HTML
      The resource: 'LiveJournal social network ...' is not accessible as guest user. You must login to access it!
  • Dataset

    ClueWeb09

    The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on...
  • Dataset

    Twitter Dumps

    The dataset consists of the 10% of the daily stream of tweets produced on Twitter filtered into 3 subsets: English, Italian, geo-referenced. The tweets are a random sample of...
  • Dataset

    Twitter social bots

    Spambots are automated accounts (i.e., accounts driven by a bot) that repeatedly advertise unsolicited and often harmful content (e.g., malware, URLs to phishing Web sites,...
  • Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • Dataset

    Twitter fake followers

    Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the...
  • Dataset

    .ee Web archive

    .ee Web archive consisting of snapshots from 2015
  • Method

    Measurement Expression Annotator

    Annotates numbers and measurement expressions in text. This method recognises many types of measurements including length, temperature, time and speed, and calculates their...
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    Digital DNA fingerprinting

    The "Digital DNA fingerprinting" is a spambot detection technique based on the "Digital DNA" online behavioral modeling technique. Given a set of Twitter user timelines, it is...
  • Application

    SWAT

    SWAT is a entity-salience system which identifies on-the-fly the semantic focus of a document, expressed by its Salient Wikipedia Entities. The core of this technology is...
  • Method

    Twitter Opinion Mining English

    This tool recognises opinionated sentences in English tweets and it classifies them as positive or negative. It also indicates emotion type, author and target of the opinion,...
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    Summa Text Summarization (Es)

    The SUMMA Text Summarization (ES) uses the SUMMA toolkit developed by Horacio Saggion to provide a generic Spanish document summarizer.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    GATE Cloud COVID-19 Misinformation Categoriser

    A machine learning classifier trained to categorise claims about COVID-19 into 10 categories proposed by the Reuters Institute for the Study of Journalism - Public authority...
    • method-engine
      The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
  • Method

    DecarboNet Environmental Annotator

    The DecarboNet environmental annotation service identifies named entities, environmental terms, linguistic features and sentiment in social media texts.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Application

    WAT

    WAT is an entity linker, namely a tool that identifies meaningful substrings (called "spots") in an unstructured English text and link each of them to the unambiguous entity...
    • HTML
      The resource: 'Link to the Application' is not accessible as guest user. You must login to access it!
  • Method

    Part Of Speech Tagger For Tweets

    This service tags tweets with part-of-speech information, e.g. nouns and verbs.
    • method-engine
      The resource: 'Run method' is not accessible as guest user. You must login to access it!
  • Method

    Web Archive Collection Extractor

    This method extracts event-centric collections of Web Archives through a focused crawling method. The key of this method is to adapt focused Web crawling to previously collected...
    • PDF
      The resource: 'Analyzing web archives ...' is not accessible as guest user. You must login to access it!
    • PDF
      The resource: 'Extracting Event-Centric ...' is not accessible as guest user. You must login to access it!
    • Github
      The resource: 'Source code' is not accessible as guest user. You must login to access it!
    • ZIP
      The resource: 'dataset' is not accessible as guest user. You must login to access it!
  • Method

    GATE Cloud URL Domain Analysis

    Service that takes a list of URLs and assigns to each information on what multiple organisations who analyse the credibility of online content have said about the domain (or...
    • method-engine
      The resource: 'Method Engine' is not accessible as guest user. You must login to access it!
  • Method

    GATE Cloud Rumour Veracity Classifier

    User generated content such as tweets often make claims that are unsubstantiated and possibly untrue. This service attempts to classify whether a text is discussing a rumour...
    • method-engine
      The resource: 'Method Engine' is not accessible as guest user. You must login to access it!