Featured Post

Game Analytics - Big Data And Business Intelligence(BI)

Games generate more data then an average application because of the game state machine . Terabytes  of data can be accumulated in a short pe...

Saturday, September 20, 2008

Startup Development Strategy Mistakes

It is a very macho and cool thing to brag about a few engineers getting together with the latest web technology such as Python and Ruby on Rails and cranking out a prototype web site and platform to get a business up and running on the internet. The theory is who cares what we develop now we will toss it anyway if and when the business takes off. Who knows if this business is even going to make it. Why waste our time thinking about a robust platform now. Let’s get something up a running.

There is some merit and truth to this strategy but it has some very real downsides if a business evolves from this first “prototype”. What happens if there is real merit to the business model? Without any consideration for an architecture that is scalable and extensible the development team and the company can find themselves in real trouble right when you need the business to grow and address the demands of the marketplace. Look at some of the popular social networks. They are terribly unstable and at times do not work at all. These are companies that actually have deep pockets and big engineering staffs. Imagine a company that is not as well funded and dealing with similar issues. This situation can become a real nightmare and can potential undermine the business. This is called changing the “engine and wheels on the car” when you are moving at light speed. Thrust me you do not want to find yourself in this situation if at all possible.

To avoid this problem during the early startup phase of a company the engineering staff should put some thought into what if this business is successful scenario. What can we do now assure that the business will be successful without causing too much of a slow down on getting the early version up and running.

The following are some areas to be considered.

Development Platform: Just because a programming environment is easy to use and the cool web 2.0 platform dejour does not necessarily mean it is going to help you when you have more than three developers and 4 hits a day to your site.
If you are remotely successful you will need to employee a caching strategy to decrease hits to the DB, tune and configure the web servers for memory management purposes and integrate with a number of web services API’s, integrate with a build and deploy environment.
Make sure your platform will give you this flexibility. Pick a platform that has proven to be robust, has a large development community supporting it, has a large pool of developers in your area that are using it, supports object oriented principles and has proven to be scalable.

Development Methodology: Set ground rules for coding standards, naming conventions, methods/class definition and usage, database development strategy and responsibility, common library creation, source code tree structure, source code management, etc. Think about the future. For example, in my most recent assignment we started out with a monolithic code base that required the entire code base to be committed for a change to any part of the code base. This eventually resulted in grid lock when multiple projects were being worked on with different lifecycle and delivery dat requirements. Break up you application into discrete sections that will allow for various development efforts to move independently.

The Database: This may be the most important decision you make!!!! It is all the rage to go the open source route, MySQL, Postgres, etc. These platforms have their merit but don’t pick them just because they do not have a licensing fee. Web applications are characteristically two tier DB centric applications. It is all about the data. In choosing a DB find out what they are good at and what they are not. Do they support stored procedures and triggers? Do they support clustering? If so, does it work and is anyone using it. How do they handle high transaction volume? How well do they replicate and backup. Does back-up activity have a significant impact on the production database? Are you going to protect your data? If so how? Do you need to warehouse your data for reporting reasons. What is you redundancy strategy? If the main DB goes belly-up who are you going to call? Will you be prepared. Does the provider have a good support staff.

Security: What is the security strategy for your website, data, passwords, etc. The last thing you want is being in the middle of a big ramp-up in traffic and have your site hacked. This is devastating from a PR and practical perspective. Establish coding standards that emphasis security. Use a security scanning service that scans your site daily to determine if there are any vulnerabilities? Determine what data needs to be encrypted. Who will have access to what in your system? Lock the environment down as best you can.

Reporting: This is an area that is frequently not taken seriously by developers but is a big deal from a business perspective. Find out what the business owners need to determine the health and welfare of the business. Make sure that your data structures, tags, events page views, naming conventions, are structured is such a way that the business owners can get what they need out of the reporting system. If you are contemplating using an off the shelf web tracking tool make sure it is going to give you what you really need. In many cases they only check page views and do not help you determine how people are navigating though your site. In most cases the data accumulated from these applications are hard to combine with customer data you are accumulating in your DB. If you are going to use a sophisticated product like Omniture make sure you understand what comes out of the box and what requires “consulting” expertise. The consulting can break the bank.

Hosting Center: Outside of having to change DB’s in mid stream this is the second biggest risk. You do not want to have to move from one hosting center to another. Have a good plan for what hosting will look like in a successful scenario. Can you architect your application in such a way that will allow you to leverage virtualization of servers and the use of services supplied by existing application operators such as Amazon and Salesforce.com. It would be great if you could take full advantage of another outfits experience and expertise. Do you need hands on system engineering resources to manage your servers? If so, will you have enough of these resources when the company is firing on all cylinders? Leverage what is going on in new hosting models to decrease your staffing needs and to decrease your overall hosting costs. Try to speculate what your bandwidth needs will be. Different hosting models have a different pricing structure and bandwidth is one of the costs that are played with in a hosting contract.

These are some of the obvious things you should engage in early in the development process to make everyone’s life easier. Make sure you properly sell this investment in infrastructure and architecture and development time to the co-founders and to the board. They will resist any investment that does not have immediate benefit. However, they are usually savvy business people and will get it if you explain the risk management aspect of the investment. In the end as a developer your life will be a lot better and you will be able to concentrate on the things you most like to do, which is coding, if you think about these items. More importantly the business will be able to grow rapidly and without significant engineering redesigns.

3 comments:

PirateGuillermo said...

Very good points. One specific example I'll offer is code introspection. Where I'm working now, we have a thin layer between our Java code and the JDBC driver. This abstraction does a little bit of management of the connection pool but where we really get value is in timing the execution of SQL and in timing the processing of result sets. This isn't particularly interesting in itself, but we find it incredibly valuable when we push new code and want to know how it performs relative to the previous release. Whenever there's a performance problem, the first thing we do is go to our logging database and look at our tracers. If there's a table that needs statistics analyzed or a new index, or if there's a new query that's got too complex a query plan we find out immediately.

Having tracers that time execution frequency and duration around different blocks of code help us any time we do new feature development as well as later on, when we're doing an optimization cycle. It's really helpful to know which blocks of code your engineers should be spending their time making better.

Unknown said...

yes having tools like these are very important so you can quickly react to the impact of your changes on production. It is really hard to simulate real production activity in a QA environment. The activity on the web today makes it almost impossible. what else are you doing in this area??

PirateGuillermo said...

We time everything. We aggregate traffic into manageable chunks (count how many times a particular code branch executes in five minutes, say) and we log the counters. We trace execution times of code blocks and log the times. We track all this by host, so we can see if we have a sick machine or pathological network. Ultimately, it's all about data. Dump data into its own database so that the logging system doesn't interfere with customer traffic, decouple the logging from the database via some kind of asynchronous message bus, and then turn your data wonks loose on it all with Excel.

Truly, I'm surprised at the capabilities of the simple spreadsheet. Being able to turn all the raw data into a graph to visualize trends is amazingly powerful.

It also helps if your engineers are all data guys. No fire and forget coding, the developers have to track their code and see how they did.