[UPDATE 3, on July 15, 2011]: I solved the mystery of mySQL MAX(ID) queries timing out, though I don't know the root cause. Ideas?
I wrapped the query in question with a try/catch and sent the details to myself in an email with the UserID in the subject. I quickly saw a pattern that about 10 UserIDs kept showing up. I saw in the database that those accounts were for people that hadn't used Refynr for months, so I disabled their accounts and deleted their StreamItem data. Voila! The timeout errors are gone!
But I have no idea why. Some corrupt data? Some huge IDs that the system couldn't handle? Ideas?
[/UPDATE 3]
[UPDATE 4]
NOPE. That did NOT solve the problem. A couple of days later, presumably as the data grew again, the errors came back. I tried the trick from update 3 above, but new timeout errors kept coming up.
Next, I upgraded from MySQL 5.1 to 5.5. It didn't help.
Then, I even upgraded the server from 4G to 8G of RAM. That didn't help either.
I know I have poor database design, but I think in the long run I'm going to need something more scalable anyways, so I'm working on switching this table to MongoDB. So far, so good. It hasn't been too painful so far.
[/UPDATE 4]
---------------------------------------------------------------------------------------------------------------------------------
Okay, here's the situation:
- I have a table called StreamItems that collects tweets/posts from Twitter and Facebook
- The IDs from Twitter and FB are large numbers like 73054697145118720 and 100000057279452_212728265405748
- The database server is dedicated for MySQL 5.1 Community, has 4G RAM, and is Quad Core
- I've tuned the MySQL memory settings to match (I'm pretty sure I've done a decent job here, but could be wrong)
- I get timeout errors at times when trying to get a MAX() or doing an ORDER BY on these IDs
- Now that there's over 5 million Items saved, it can't handle the sorting necessary, so we need a better solution
My thought is to:
- Create a table called StreamItemMaxIDs that stores the max IDs for each Twitter/FB account
- Instead of scanning the whole StreamItems table for MAX or sorted IDs each time I need it, simply store the MAX ID in this separate table so that the query is fast every time
[I know that a NoSQL solution *might* be better (especially long-term), but am looking for a mySQL solution at the moment.]
So, should I keep the logic of updating the MAX IDs in the application-level code, or write a mySQL trigger? (Keep in mind I'm not a DBA)
Let me know if you need more info.
[UPDATE: here's the table schema]
[UPDATE 2: here's the dynamic SQL that times out under load]
