I'm working on upgrading the architecture of Refynr, and am trying to decide on the new primary database platform, specifically for large volumes of Tweets, which I plan to store as JSON from the Twitter API.

At the moment I'm using mySQL with nearly 4 Million tweets stored. 

The new requirements will be:

  1. able to store up to 500 Millions tweets over the next 12 months
  2. also able to store 100 Million Facebook posts
  3. and able to store 10 Million RSS feed items
  4. clusterable for HA
  5. free, or nearly free
  6. established, proven platform (nothing super-new or Alpha)
  7. scalable for both high writes and high reads of the data: 
    • there will be a Refynr API that feeds to TweetDeck, HootSuite, and perhaps a few others that support TwitterAPI-compatible feeds
    • up to 100K writes per minute, as data is pulled in from Twitter, FB & RSS
  8. Nice to haves:
    • code libraries already built, that connect via CFML or Java
    • fairly simple to set up (I'm new to the noSQL world)
    • can easily import data from mySQL tables
    • cloud server-compatible: preferably either Rackspace CloudServers, AWS, or GAE
    • runs on Linux
    • open-source
    • easy to test/run on Mac OSX

I have my own ideas what to use, but don't want to sway your opinion, so am asking the open-ended question:

What would you use? What is your experience with your recommendation? And what are your reasons?