Tools We Need

Dumping ground for ideas for applications, services, and infrastructure needed for open knowledge development.

Distributed file systems

Formally this is just:

  1. A file-system distributed over a network

However also taken to include (as we shall do):

  • replication both for:
    1. backup (having multiple copies of given piece of data)
    2. distributed distribution (e.g. bittorrent)

Note that both types of replication will often involve 'chunking' that is the division of a given file into multiple pieces that are distributed independently.

Existing Tools

  • http://allmydata.org/trac/tahoe

    • Welcome to allmydata.org "Tahoe": a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source licence. This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious.
  • osprey: http://osprey.ibiblio.org/

    • Osprey is a peer-to-peer enabled content distribution system. A metadata management system for software and document collections enables local and distributed searching of materials. Items are available for download directly via URL or indirectly via the BitTorrent peer-to-peer protocol.

  • Google File System:
  • Jetfile
  • http://www.nodezilla.net/

    • Technically, Nodezilla is a secured, distributed and fault tolerant routing system (aka Grid Network). Its main purpose is to serve as a link for distributed services built on top of it (like chat, efficient video multicasting streaming, File Sharing, secured file store ...). Nodezilla provides cache features; any server may create a local replica of any data object. These local replicas provide faster access and robustness to network partitions. They also reduce network congestion by localizing access traffic. It is assumed that any server in the infrastructure may crash, leak information, or become compromised, therefore in order to ensure data protection, redundancy and cryptographic techniques are used.
  • Pastry: http://research.microsoft.com/%7Eantr/Pastry/

    • Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.
    • Also has useful set of links to related projects
  • Ocean Store: http://oceanstore.cs.berkeley.edu/

    • Providing Global-Scale Persistent Data
  • Hadoop file system: http://wiki.apache.org/lucene-hadoop/DFS

    • Part of the Hadoop project (see below) under Distributed Processing
    • Open Source (java based) with

Distributed Processing/Services

  • Hadoop: http://lucene.apache.org/hadoop/

    • Hadoop is a software platform lets one easily write and run applications that process vast amounts of data.
    • Based on distributed both data and processing over large numbers of nodes (and clusters)
  • Grub: http://grub.org/

    • distributed web crawling
    • Acquired by Wikia (and open sourced) in 2007-07 as part of Wikia's open search efforts

Timelines

One would like a tool that would display timelines given input with time tags. Two related concerns here:

  1. Specifying the input format (i.e. the time tags)
  2. Writing code that will then display the input
    • suggested output formats:
      1. html + js
      2. png

Existing Software

Rendering of geo locations using **open data**

Would like a simple software package to allow one to render location on a map with link to text. Essential that underlying map data be open. Would also like this to be deployable locally in addition to on a server. Preference would be for python or javascript type implementation.

See also: http://lists.okfn.org/pipermail/okfn-discuss/2006-August/000127.html and the resulting thread.

Also, as of May 2008, if you go to OpenStreetMap (http://openstreetmap.org/) and click on the new "Export" tab, you can generate a smidge of HTML (or static images or vectors or what have you) for importing an OSM map into an iframe on any web page.

Text annotation

What text annotation tools exist and what 'api' do they require from the text. To take a concrete example: how would one annotate shakespeare texts in a non-invasive manner and what open source tools already exist. A demo implemenation would be particularly useful.