Date Tags EAR / python

Update 2011/02/23 11:02 PST
Added the lift tag and updated the list.

Update 2011/02/22 13:19 PST
Added the jsf tag (java server faces) and updated the total question count for each item on the list.

Update 2011/02/22 11:14 PST
Adding spring-mvc as that was what I originally was originally supposed to have.

Update 2011/02/22 10:36 PST
For the interested, here is the table used to generate the graph.

Update 2011/02/22 10:05 PST
As per comments on this post, I updated the list by removing hibernate, spring, and sass and added gwt and grails. I also updated the chart reflecting this information, and created an additional chart which plots the frameworks as a percentage of the questions asked each week to hide stackoverflows’s growing popularity.

Adam and I are currently in the process of working on our research about the Execution After Redirect, or EAR, Vulnerability which I previously discussed in my blog post about the 2010 iCTF. While Adam is working on a static analyzer to detect EARs in ruby on rails projects, I am testing how simple it is for a developer to introduce an EAR vulnerability in several popular web frameworks. In order to do that, I first needed to come up with a mostly unbiased list of popular web frameworks.

My first thought was to perform a search on the top web frameworks hoping that the information I seek may already be available. This search provided a few interesting results, such as the site, Best Web-Frameworks as well as the page Framework Usage Statistics by the group BuiltWith. The Best Web-Frameworks page lists and compares various web frameworks by language, however it offers no means to compare the adoption of each. The Framework Usage Statistics page caught my eye as its usage statistics are generated by crawling and fingerprinting various websites in order to determine what frameworks are in use. Their fingerprinting technique, however, is too generic in some cases thus resulting in the labeling of languages like php and perl as frameworks. While these results were a step in the right direction, what I was really hoping to find was a list of top web frameworks that follow the model, view, controller, or MVC, architecture.

After a bit more consideration I realized it wouldn’t be very simple to get a list of frameworks by usage, thus I had to consider alternative metrics. I thought how I could measure the popularity of the framework by either the number of developers using or at least interested in the framework. It was this train of thought that lead me to both Google Trends and StackOverflow. Google Trends allows one to perform a direct comparison of various search queries over time, such as ruby on rails compared to python. The problem, as evidenced by the former link, is that some of the search queries don’t directly apply to the web framework; in this case not all the people searching for django are looking for the web framework. Because of this problem, I decided a more direct approach was needed.

StackOverflow is a website geared towards developers where they can go to ask questions about various programing languages, development environments, algorithms, and, yes, even web frameworks. When someone asks a question, they can add tags to the question to help guide it to the right community. Thus if I had a question about redirects in ruby on rails, I might add the tag ruby-on-rails. Furthermore if I was interested in questions other people had about ruby on rails I might follow the ruby-on-rails tag.

Between the number of questions per tag, the number of answers per tag, and the number of followers per tag, StackOverflow provides a few metrics for measuring the relative level of developer interest in various web frameworks. Success! The next step was to extract these numbers for the tags of various frameworks. For this, I attempted to find StackOverflow tags corresponding to all the frameworks listed on the Best Web-Frameworks site I previously found. I skipped the framework languages CSS and Javascript as they aren’t server side frameworks. I then narrowed the list down to the frameworks which had at least 100 questions asked.

This produced the following frameworks sorted by total number of questions asked:

  1. (31156) ruby-on-rails
  2. (20587) asp.net-mvc
  3. (14951) django
  4. (4726) zend-framework
  5. (3510) jsf
  6. (3336) gwt
  7. (3296) cakephp
  8. (3127) codeigniter
  9. (2731) grails
  10. (1976) spring-mvc
  11. (1603) symfony
  12. (912) struts
  13. (538) kohana
  14. (515) pylons
  15. (514) sinatra
  16. (506) dotnetnuke
  17. (420) wicket
  18. (227) lift
  19. (194) yii
  20. (163) cherrypy
  21. (126) web2py
  22. (106) catalyst

The following were originally included but are not web frameworks:

This list alone seems to work fairly well, however, I wanted to take it one step further which was to see the number of questions asked on a per week basis since the start of StackOverflow. Using the StackOverflow API (I used the API to generate the previous list too) I wrote a script to generate a CSV file containing this information. The information is depicted in the interactive chart below for the top 10 frameworks according to total number of StackOverflow questions. Each point in the chart represents the number of questions asked in a one week period starting on the date of the data point (protip: hover over chart to get the exact values).

top10

The data confirms my previous suspicion that ruby on rails is the number one MVC and that django and cakePHP would also appear in the top 10. I must admit that I had never before heard of asp.net MVC, however considering that stackoverflow and all other stackexchange sites run on asp.net MVC, it makes sense that it would rank quite high.

I added the below chart to show the relative percentage of questions per tag over time as per Big Dave’s Gusset’s comment. This hides the growing popularity of stackoverflow.

normalized_top10

The data for the above chart was extracted using the following script. The script requires the python package py-stackexchange in order to run and can be easily modified to add additional tags or change the filtering methods.

#!/usr/bin/env python
import datetime, sys, time
from stackexchange import Site, StackOverflow

frameworks = [# php
              'zend-framework', 'cakephp', 'symfony', 'codeigniter', 'seagull',
              'prado', 'solar', 'ezcomponents', 'kohana', 'jelix', 'flow3',
              'modx', 'sapphire', 'yii', 'limonade', 'tekuna', 'doophp',
              'fat-free', 'akelos', 'php-on-trax', 'atk',
              # ruby
              'ruby-on-rails', 'merb', 'ramaze', 'halcyon', 'sinatra', 'webby',
              'sass',
              # perl
              'catalyst', 'interchange', 'mason', 'cgi-application', 'jifty',
              'gantry', 'dancer', 'mojolicious',
              # java
              'struts', 'hibernate', 'spring', 'wicket', 'play', 'stripes',
              # python
              'django', 'pylons', 'grok', 'turbogears', 'web2py', 'cherrypy',
              # coldfusion
              'cfwheels', 'coldspring', 'model-glue',
              # asp.net
              'asp.net-mvc', 'dotnetnuke', 'monorail', 'vici']

class TagStats(object):
    DATE_START = 1217540572
    WEEK_SECONDS = 604800
    def __init__(self, tag_names):
        self.so = Site(StackOverflow, 'LzYJwh19o0WCIvXK9q6k6g')
    self.tag_names = tag_names
        self.tags = []
        self.stats = {}

    def output_counts(self, html=False):
        tmp = []
    for tag in sorted(self.tags, key=lambda x:x.count, reverse=True):
            tmp.append((tag.count, tag.name))
        if html:
            print ''
            for count, name in tmp:
                print ('(%d) '
                       '%s') % (count, name, name)
            print ''
        else:
            print '\n'.join(['%8d %s' % x for x in tmp])

    def get_tags(self, min_size):
        for name in self.tag_names:
            query = self.so.tags(filter=name)
            for tmp in query:
                if name == tmp.name:
                    break
            else:
                sys.stderr.write('Not found: %s\n' % name)
                continue
            if tmp.count < min_size:
                sys.stderr.write('Too few questions: %s\n' % name)
            else:
                self.tags.append(tmp)
        self.stats = dict(zip([tag.name for tag in self.tags],
                              [[]]*len(self.tags)))

    def output_stats_by_week(self, start_week=0):
        now = int(datetime.datetime.now().strftime('%s'))
        num_weeks = (now - self.DATE_START) / self.WEEK_SECONDS
        print ', '.join(tag.name for tag in self.tags)
        for i in range(start_week, num_weeks):
            sys.stdout.flush()
            start = self.DATE_START + i * self.WEEK_SECONDS
            end = self.DATE_START + (i + 1) * self.WEEK_SECONDS
            counts = []
            for tag in self.tags:
                try:
                    count = self.so.questions(tagged=str(tag.name),
                                              fromdate=start,
                                              todate=end).total
                except Exception, e:
                    sys.stderr.write('Stopped at week %d\n' % i)
                    sys.exit(1)
                self.stats[tag.name].append(count)
                counts.append(str(count))
            print ', '.join(counts)


def main():
    try:
        start_week = int(sys.argv[1])
    except IndexError:
        start_week = 0
    tag_stats = TagStats(frameworks)
    tag_stats.get_tags(100)
    #tag_stats.output_counts(html=True)
    tag_stats.output_stats_by_week(start_week)

if __name__ == '__main__':
    sys.exit(main())

Happy web-framework coding!


Comments

comments powered by Disqus