Solo Web Startup

A solo entrepreneur's 1st startup

  • @Refynr
  • My startup: Refynr.com
  • Me on LinkedIn
    • 22
      22 Jun 2011

      MySQL database design question: Trigger or not?

      • Edit
      • Delete
      • Tags
      • Autopost

      [UPDATE 3, on July 15, 2011]: I solved the mystery of mySQL MAX(ID) queries timing out, though I don't know the root cause. Ideas?

      I wrapped the query in question with a try/catch and sent the details to myself in an email with the UserID in the subject. I quickly saw a pattern that about 10 UserIDs kept showing up. I saw in the database that those accounts were for people that hadn't used Refynr for months, so I disabled their accounts and deleted their StreamItem data. Voila! The timeout errors are gone!

      But I have no idea why. Some corrupt data? Some huge IDs that the system couldn't handle? Ideas?

      [/UPDATE 3]

      [UPDATE 4]

      NOPE. That did NOT solve the problem. A couple of days later, presumably as the data grew again, the errors came back. I tried the trick from update 3 above, but new timeout errors kept coming up.

      Next, I upgraded from MySQL 5.1 to 5.5. It didn't help.

      Then, I even upgraded the server from 4G to 8G of RAM. That didn't help either.

      I know I have poor database design, but I think in the long run I'm going to need something more scalable anyways, so I'm working on switching this table to MongoDB. So far, so good. It hasn't been too painful so far.

      [/UPDATE 4]

      ---------------------------------------------------------------------------------------------------------------------------------

       

       

      Okay, here's the situation:

      1. I have a table called StreamItems that collects tweets/posts from Twitter and Facebook
      2. The IDs from Twitter and FB are large numbers like 73054697145118720 and 100000057279452_212728265405748
      3. The database server is dedicated for MySQL 5.1 Community, has 4G RAM, and is Quad Core
      4. I've tuned the MySQL memory settings to match (I'm pretty sure I've done a decent job here, but could be wrong)
      5. I get timeout errors at times when trying to get a MAX() or doing an ORDER BY on these IDs
      6. Now that there's over 5 million Items saved, it can't handle the sorting necessary, so we need a better solution

      My thought is to:

      1. Create a table called StreamItemMaxIDs that stores the max IDs for each Twitter/FB account
      2. Instead of scanning the whole StreamItems table for MAX or sorted IDs each time I need it, simply store the MAX ID in this separate table so that the query is fast every time

      [I know that a NoSQL solution *might* be better (especially long-term), but am looking for a mySQL solution at the moment.]

      So, should I keep the logic of updating the MAX IDs in the application-level code, or write a mySQL trigger? (Keep in mind I'm not a DBA)

      Let me know if you need more info.

      [UPDATE: here's the table schema]

      [UPDATE 2: here's the dynamic SQL that times out under load]

       

      • views
      • Tweet
    • 5
      13 Apr 2011

      How to improve speed of a MySQL MAX() query

      • Edit
      • Delete
      • Tags
      • Autopost

      The only performance problem I've had with Refynr in the last few months is with a SQL statement I had been using to get the MAX Twitter tweet ID before doing a fresh API call to Twitter to get the most recent tweets. It only affected my @refynr account on the app because that account has the most Refyn'd Tweets in the system: over 200,000! The query that timed out quite often was trying to find the MAX() on the field that holds Tweet IDs (which are very large numbers).

      Anywho, the code below shows the new approach, which does an ORDER BY, DESC (w/ LIMIT 1) on the Primary Key of the table instead. The MAX() query was timing out at 30 seconds about half the time on my Refynr account, while the new query consistently performs in under 300ms! I don't quite understand the internals of why it's faster, so can someone shed some more light?

      Special thanks to this article for the idea: http://forge.mysql.com/wiki/Top10SQLPerformanceTips

      • views
      • Tweet
    • 0
      29 Dec 2010

      Top 10 Architecture Scalability Mistakes Made in ColdFusion

      • Edit
      • Delete
      • Tags
      • Autopost
      Repost from my old CFZen blog:

      Posted : Aug 8, 2009 12:56 AM

      This is a focus on architecture design mistakes I've seen made too often over the years by CF developers building ColdFusion architectures. Often times, the codebase and system started small and grew too fast, such that the original developer/architect didn't have the time or experience to scale the system up with the demands of high-load applications.

      1. Not dedicating enough time and resources on optimizing the database:

        Too often, there are smart, but unqualified programmers/developers designing and maintaining the database.  A qualified/certified DBA, even part-time, is well worth the investment. You need one who can profile your database, tune all the server and individual database settings, recomming index changes, be responsible for DB design and integrity, regular maintenance, set up replication/failover/clustering, and will proactively monitor your database to let you know what problems need attention in relationship to your applications.

      2. Not dedicating enough time and resources on optimizing queries:

        Again, developers tend to throw themselves into the role of writing all the queries, but don't necessarily think about the performance/scalability issues of writing non-optimal queries.  Often times a DBA who is qualified for all the Database Server stuff (see #1 above) isn't the same type of DBA who can optimize your queries.  You need someone who can run the queries with explain plans, and then can make recommendations on improvements.  These improvements pay off immensely!

      3. Coupling all parts of the system into one mega-application, instead of separate applications

        Usually they start out as fine: clever little applications that "do the job". They may have started in the CF 4 or 5 days, and grow and grow over the years until they're a mass of spaghetti code.  It may never occur to the developers that as the app gets bigger that not everything belongs together any more.  It will take some work, but it's important for long-term stability and scalability that you de-couple big chunks into smaller applications that run independent from the original beast.  Scheduled Tasks, Web Services, APIs, User Admin Areas, Reporting, file/image services, Search services, etc. should no longer be coupled to the main application.  Each of those parts should be broken out, so that each can be tuned optimally.  Do you want your whole system to start bottlenecking because Search or Web Services become slow or go down?  If they're decoupled, you don't have that problem.

      4. Not using a good "shared" file system scalable for high numbers of file writes

        When there are multiple physical servers in your cluster (if applicable), you may want a single location to save files.  No-no's here are mapped drives/mounts to a non-dedicated server (such as a drive on your database server, some external drive, etc.).  A good NAS and/or SAN solution with failover scales better as file writes become higher over time.  Cloud Computing solutions are becoming big now, too, but I haven't tested enough in this area to make sound recommendations. I'm sure other with more experience can tell us.  Comments?

      5. Not tuning the JVM

        Many articles have already been written on this subject, so I won't go into it.  If you're having performance problems and you don't know how to tune your JVM, get help from one of the many articles out there, hire an expert, or get Adobe Platinum support for CF.

      6. Not tuning Request settings in the ColdFusion Administrator

        See #5 above.

      7. Not making good use of Multiple Instances

        When you de-couple areas of your application (see #3), often it's wise to put your new applications into separate CF instances.  For instance, all heavy tasks (Scheduled Tasks and other "batch" type processes) should run on a separate instance.  You don't want to know how many times I've seen a Scheduled Task bring the entire infrastructure to it's knees.

      8. Not using CF Monitoring tools, such as SeeFusion, CF8 Server Monitor, and/or FusiontReactor

        Get familiar with at least 2 of these, and hire an expert to help you get up to speed on them more quickly.  These save your butt!

      9. Trying to recreate the "holy grail" reporting system from scratch

        First, any reporting system should be a prime candidate for de-coupling.  And reports that come from the database should not be running against your "write" (master) database, but rather against your "read-only" (slave) database, or else you'll have a lot of bottlenecks on your hands as users create massive reports reading data while other users are trying to simultaneously write data via the rest of the application.

        But secondly, look for reporting solutions that are already available.  Most of the time users don't really need *real-time* reports, and off-line/archived data lends itself better to the more complex reports that users start to request over time (allowing Excel exports makes a lot of people happy, cuz they know how to generate reports "the way they like them" in familiar Excel.  Also, don't write a Report UI that allows users to request *too many* records - your milage will vary, but it's almost never a good idea to allow reports that have no filters, such that they can just *get everything* all at once.  Few web apps are built for handling that much data, so find other solutions.

      10. Not taking advantage of various caching capabilities:

        We all know (hopefully) to cache queries where possible, but... do you use the Application/Server scope where appropriate, and to use cfcache (or other 3rd party CF cache tags/CFCs/frameworks) in areas that rarely change?  Do you come up with caching strategies for data, so that it can persist in the Application/Server scope without going back to the dB unnecessarily?  Could other data/files be cached to disk?  Can you use Verity (or Lucene/Solr) for searches instead of pulling directly from the database all the time?

      Aaron Longnion
      http://refynr.com

      • views
      • Tweet
    • 18
      10 Nov 2010

      Top 14 tools for my web startup

      • Edit
      • Delete
      • Tags
      • Autopost

      I've been working on my solo web start up full-time for about 2 months now, and so this post is about the software, programming platforms/frameworks, operating systems, websites, and apps that I use to stay fast and productive.

      First, let me quickly bullet-point my setup:

      Local Development:

      • Late 2009 MacBook Pro w/ 8Gigs of RAM
      • Railo, with built-in web server
      • Eclipse w/ CFEclipse, Subclipse, & MXUnit plugins
      • MySQL Community 5.1.51

      Cloud Server (Rackspace):

      • CentOS 5.4
      • Railo 3.2
      • Apache 2.2
      • MySQL Community 5.1.51

      Top tools list:

      1. Railo - similar to Adobe ColdFusion, but free and open-source, Railo (I'm using the free 3.2 beta version on my local development environment, which is compatible with Adobe ColdFusion 9.01). It's been great! Even though I've been working professionally with only the Adobe version of CF for 10 years, jumping to Railo was fairly painless, once I learned how to install it properly, etc.
      2. Gmail w/ Google Apps - I set up refynr.com mail for GMail and Google apps cuz it's free and very easy to do. I created an info@ address and do all my communication and responding to feedback from there. I also use Google Docs and Spreadsheets for keeping track of competition, Refynr features, ideas, etc. This is a no-brainer.
      3. Rackspace CloudServers: cost me all of $43 dollars last month, which included [1] bandwidth for over 70 Refynr Alpha testers (which does a lot of pulling from the Twitter & Facebook APIs), [2] full snapshot image backups of the server, [3] support via email or Live Chat, and [4] engineers that monitored and automatically rebooted my server when it went down mysteriously this past weekend. I'm happy; however, Amazon recently announced free pricing for lower-end cloud servers, and I'm keeping my eye on that...
      4. HooteSuite - read and post with multiple accounts. I manage 2 Twitter Accounts, Facebook, and LinkedIn from it. I prefer it over TweetDeck so that I can log in from any computer any time without worrying about installing/updating. Also, the analytics for my accounts and Google Analytics for my websites are nice.
      5. CFEclipse - it has great built-in support for Railo, and it's what I've already used for the past 5/6 years, so I found no need to try anything else right now.
      6. MyVersionControl.com - free/cheap Subversion hosting, with more features than I need. I just need the SVN repo space (hosted and secure), and use the Subclipse Eclipse plugin with CFEclipse. Simple. Done.
      7. MySQL - It's free and I already have a few years of experience with it. I did some preliminary investigation into the NoSQL options like MongoDB and CouchDB, but I don't need massive scalability (yet), so am sticking with what I know for now so that I don't get stuck to often. I'd rather just get er done at the moment.
      8. CentOS 5.4 - pretty much an exact replica of RedHat 5.4, so there's lot's of online documentation and help. Plus, it came as an inexpensive option with Rackspace. :)
      9. Sequel Pro - easy-to-use, free, and beautiful MySQL client for Mac. It does what I need 99% of the time, but sometimes I need to use MySQLWorkbench (which I found buggy and cumbersome for everyday use) for stuff like updating certain types of indexes or constraints.
      10. Posterous - this blog runs on Posterous in case you didn't notice. It's great if I need to do a quick post from email, but I like it more for posting via web, adding quick slide shows, and being able to auto-post to Twitter, Facebook, LinkedIn and FriendFeed. I get way more hits on my blog via Posterous than previous platforms I used. This is partially due to making sure it's in RSS aggregators like http://www.coldfusionbloggers.org/ and http://feeds.adobe.com/index.cfm?query=byCategory&categoryId=1&catego...
      11. CyberDuck - their website says it all - "Open source FTP, SFTP, WebDAV, Cloud Files, Google Docs & Amazon S3 Browser for Mac & Windows." (and FREE!)
      12. Seashore - Photoshop cost money, and GIMP is too complicated for me. I wanted something simple for common tasks. On Windows, that was Paint.NET. On Mac, it's Seashore! (note: I clearly need some better graphics skillz)
      13. jQuery Mobile (Alpha) - it makes the site look good on most mobile devices (iPhone, iPad, Android, BlackBerry, etc.), but also looks good on Firefox and Safari. Now, I'll just need to make Refynr.com look decent on IE 8+ and work out the kinks in Chrome. That's pretty good cross-browser/cross-platform compatibility with relatively little effort! And since it's in Alpha, it will only getting better in the next few months... Here's my post about implementing jQuery Mobile on Refynr.com.
      14. CFML Open Source code - First, I decided to not use a ColdFusion Framework (even though ColdBox, CFWheels, and FW/1 were VERY tempting) because I wanted to write the core of Refynr.com from scratch and to make damn sure there's no performance issues when/if I need to scale Refynr.com. In my professional career, I have never built anything very big from scratch; I have always worked on legacy ColdFusion code. Here's the open-source ColdFusion code I've used so far:
        • MonkehTweet - http://monkehtweet.riaforge.org/ for Twitter API & oAuth integration
        • Facebook ColdFusion SDK - https://github.com/affinitiz/facebook-cf-sdk/ for FB API integration
        • ValidateThis - http://www.validatethis.org/ for form validation
        • cfUniForm - http://cfuniform.riaforge.org/ for building forms
        • Timezone.cfc - https://github.com/rip747/TimeZone-CFC for timezone management
        • MXUnit.org - for unit testing CF code

      Honorable mentions (hope to use in the future) - LogBox, Rackspace CloudFiles, MemCached, MongoDB or CouchDB (?), VirtualBox, Parallels, ManyMoon, Chargify, Selenium, and IETester. And I am testing http://www.feedbackify.com/ for getting feedback off of Refynr.com, but may try http://GetSatisfaction.com at some point, too.

      Comment on what tools you use for your startup...

      • views
      • Tweet
    • 0
      7 Nov 2010

      Quick MYSQL Tip: CAST varchar before using MAX

      • Edit
      • Delete
      • Tags
      • Autopost
      When you have a non-numeric data type, even if they are all numeric values, do this...

      SELECT     MAX(CAST(myVarcharID AS UNSIGNED)) AS myID
      FROM        myTable;

      instead of just...

      SELECT     MAX(myVarcharID) AS myID
      FROM        myTable;

      cheers!
       

      • views
      • Tweet
    • 1
      4 Aug 2010

      Re-runnable MySQL release scripts

      • Edit
      • Delete
      • Tags
      • Autopost

      In a dev team, often times you manage database release scripts via "release***.SQL" files that are built up by various team members and saved in version control systems such as SVN or GIT. One problem is that when developer A makes some data or schema/design changes to the database, developer B & C don't necessarily know if it will cause duplicate records, or with schema changes it can cause errors if the scripts are run more than once. Here's a few concrete examples:

      1. For a Users table, developer A writes a sql script to insert some new records. Developer C updates to HEAD in SVN and runs the scripts from developer A. A few days later, Developer B writes some additional scripts to insert other new records. Developer C updates SVN again and wants to make sure his database is up-to-date, but he's smart and doesn't want to mess up anything with duplicate records, so he asks Developer A and B which scripts have already been run, etc. Developer A and B can't remember exactly, so Developer C does some SVN diffs and confirms which records have already been written to his local copy of the database. Now he knows he only has to run the new scripts from Developer B, and can skip the ones from Developer A.
      2. Developer A creates some new tables. Developer C runs the scripts on his local database. Developer B creates more new tables. When Developer C runs the scripts again, he gets errors about a database table already existing. He has to investigate a little to see what the error was all about, and if it's really a problem, or just an expected side-effect.

      Yuck! That's too much of a headache for team members to have to worry about. So, let's make some rules about release scripts for the database:

      1. There should never be errors. If there's an error, then there was a problem in the script and someone should fix that error. If not, then we should be able to safely assume the database is properly updated.
      2. You should be able to always run all the scripts in a release file without causing errors or duplicate rows of data. Write your scripts so that every time the scripts are updated, the next developer can run the entire file safely. This takes extra work up-front, but saves a lot of confusion and headaches over the long term.

      For creating new tables, MySQL provides native syntax: CREATE TABLE IF NOT EXISTS `Employees` (...

      For making sure you only add a column once, or only change a column name once, the code below includes stored procedures to handle it: AddColumnUnlessExists () and ChangeColumnUnlessExists().

      For making sure you only add a row of data once, MySQL also provides some native, though verbose, syntax. Here's the SQL to make it happen.

      Note that there's also version control tools for databases, but that may or may not work for your team and I am not familiar with those tools.... yet.

       

      • views
      • Tweet
    • 0
      19 Jan 2010

      Force RPM install of MySQL over old version

      • Edit
      • Delete
      • Tags
      • Autopost

      I'm not sure if this is smart in all situations, but if you have a borked version of MySQL (5.0 in my case), and want to install a higher version of it, you can use these commands in Linux (RHEL in my case) to force installation on the server- and client- RPM files, assuming those files are in the current directory:

      rpm -ivh MySQL-* --aid --force --nodeps --replacefiles

      Aaron Longnion
      Director of Internet Technology
      www.PeopleXS.com
      www.Twitter.com/aqlong
      Join the PeopleXS Community at www.peoplexs.com/communities/nl/


      • views
      • Tweet
    • Search

    • Sites I Like

      • Sean Corfield's blog
      • Filter Twitter & Facebook w/ Refynr
      • Raymond Camden's ColdFusion Blog
      • Aaron West's Blog
      • Ben Nadel's blog
      • A Smart Bear blog
      • IttyBiz blog
      • Charlie Arehart ColdFusion Consulting
    • Tags

      • refynr
      • ColdFusion
      • startup
      • Twitter
      • entrepreneur
      • social media
      • facebook
      • railo
      • Jobs
      • cloud computing
      • programming
      • social networks
      • legacy code
      • cfml
      • jquery
      • marketing
      • video
      • Developer
      • Holland
      • design
      • logo
      • rackspace
      • LinkedIn
      • Subversion
      • crowdsource
      • googleplus
      • linux
      • security
      • sql
      • Peoplexs
      • SEO
      • Verity
      • adobe
      • conferences
      • openbd
      • radio
      • redhat
      • regex
      • Balsamiq
      • Ben Forta
      • CFBuilder
      • CFUG
      • Napkee
      • SOLR
      • SOTR
      • air
      • android
      • austin
      • beta
      • blog
      • bufferapp
      • centos
      • cfeclipse
      • chrome
      • clustering
      • css
      • curation
      • customer service
      • database
      • dutch
      • feedback
      • filter
      • filtering
      • funding
      • git
      • gmail
      • google
      • humor
      • i18N
      • iphone
      • jQM
      • java
      • jrun
      • json
      • microsoft
      • mobile
      • movies
      • mxunit
      • nosql
      • ocfs
      • open office
      • pitch
      • public speaking
      • rhel
      • scalability
      • sql server
      • svn
      • ui
      • unit testing
      • ux
      • web
      • webapp
      • website
      • websockets
      • windows
      • work
      • youtube
    • Archive

      • 2012 (6)
        • February (4)
        • January (2)
      • 2011 (57)
        • September (2)
        • August (1)
        • July (4)
        • June (3)
        • May (9)
        • April (5)
        • March (14)
        • February (8)
        • January (11)
      • 2010 (66)
        • December (13)
        • November (11)
        • October (7)
        • September (8)
        • August (5)
        • July (2)
        • June (2)
        • May (2)
        • April (7)
        • March (3)
        • February (1)
        • January (5)
      • 2009 (21)
        • December (7)
        • November (5)
        • October (3)
        • September (5)
        • August (1)
    • Obox Design
  • Solo Web Startup

    Web developer/architect turned solo web entrepreneur

    324925 Views
  • Get Updates

    Subscribe via RSS
    TwitterTwitterFacebookPageFacebookLinkedInFriendfeedDelicious