Friday, May 22, 2015

Function caching decorator [reprise]

Sudden inspiration to write a blog.

A long time ago, I wrote a post about a decorator that could somehow cache an expensive function. There were some ideas in the comments, but I never really followed up the idea I spawned in that particular post. And I never really gave it more thought or research either.

Today I was at the PyGrunn conference. Tom Levine, author of Vlermv, held a presentation about that package. He told there that he wrote a decorator called `cache` (ctrl+f on the page I linked) that he uses for the exact same goal as I wrote my original post. He also noted that `cache` would probably be a bad name for a decorator like that.

At the end of the presentation there was some time where people could ask questions. A lot of people gave Tom tips on packages he could look into and there was one helpful attendant who called out Python's `memoised` decorator. I noted that one of the commenters on my original post also named memoize, but that commenter linked to a decorator inside a Plone package. I searched a bit on the internet today and there's a class inside the PythonDecoratorLibrary that does exactly what I initially wanted.

So here's the link to that page.

Friday, October 17, 2014

Object references in class definitions

It's been a while. Things happen, I guess.
Today I learned a nice thing about Python. I don't know the specifics, but I was told that this is an intentional design decision in Python.
The thing is: When you declare an object in a class, the class uses a 'pointer' towards that object. (I'm using the word 'pointer' here. The specifics might be a bit different. Be free to elaborate in the comments). This means that if you have two instances of your class and you change one of the object variables, it is also changed in the second instance of your class.
Code example to make things clear:
# Declare a class with two variables
class TestObj(object):
    obj_var = {}  # object
    val_var = 11  # value

# Create two instances of the class
test_one = TestObj()
test_two = TestObj()

# Change both variables of one instance
test_one.obj_var['key'] = 'value'
test_one.val_var = 10

# Print both variables of both instances
test_one.val_var
Out: 10  # Expected
test_two.val_var
Out: 11  # Expected

test_one.obj_var
Out: {'key': 'value'}  # Expected
test_two.obj_var
Out: {'key': 'value'}  # Huh?

The solution?
class TestObj(object):
    val_var = 11

    def __init__(self):
        self.obj_var = {}

Instead of using __init__ you can of course declare the variable whenever you need it.

Update:
You might think it's a bad idea to change class variables on an instance of a class. You are correct, it feels bad and you probably shouldn't be doing things like I did in my example. However, look at the issue still exists when you let the class itself change its variables:
class TestObj(object):
    obj_var = {}
    
    def change_obj_var(self):
        self.obj_var['hello'] = 'world'
        
test_one = TestObj()
test_two = TestObj()

test_one.change_obj_var()

test_one.obj_var
Out: {'hello': 'world'}  # Expected
test_two.obj_var
Out: {'hello': 'world'}  # Huh?

Thursday, December 15, 2011

Function caching decorator

Every once in a while, you have to create a function in one of your models, that does a lot of queries. If you use that function a couple of times, you might find it a wise thing to 'cache' the function. A 'cached' function will look like this:
def get_important_data(self):
    if not hasattr(self, _important_data):
        self._important_data = self.get_result_of_lots_of_queries()
    return self._important_data
If you call this function once on an object, it will run a lot of queries, but will store the result in self._important_data. The next time you call the function on this exact same object, the attribute self._important_data will still be set and the huge load of queries do not have to be executed again.

In some projects you have to create such a function more than once, for whatever reason. A colleague got sick and tired of writing the same code over and over again and came up with an idea to tackle this concurrency. So he asked me if, when I had little to do, I could write a decorator that does exactly the same kind of 'caching' as the function above. I however have little to no experience in writing decorators, so I copy/pasted something together from the interwebz, that seems to do what I want.

This is what I came up with.

in decorators.py:
from functools import wraps

def cached_function(attr):

    def inner_cached_function(fn):

        def return_attr(*args, **kwargs):
            try:
                cls = args[0].__class__ # args[0] should be fn's self
            except:
                raise Exception("""A function with the cached_function
                    decorator must use 'self' as first argument""")

            if not hasattr(cls, attr):
                value = fn(*args, **kwargs)
                setattr(cls, attr, value)
            return getattr(cls, attr)

        return wraps(fn)(return_attr)

    return inner_cached_function
in a ModelClass:
    @cached_function('_important_data')
    def get_important_data(self):
        return self.get_result_of_lots_of_queries()
As said/written before, I almost have no experience with this, so please comment if you have anything to say about this solution.
Thanks :-)

Monday, July 11, 2011

FeinCMS and Fixtures: Check your trees

A short while ago, I finished my first FeinCMS website. While building the website, I was surprised by the way the tree of pages, subpages, subsubpages and so on, is being saved into the database. FeinCMS turned out to be using something called 'Modified Pre-order Tree Traversal'. Google or Bing or whatever for 'MPTT' and you will find out how it works, if you don't know it already. It seems I wasn't paying a lot of attention in mathematics classes, because all my colleagues seemed to know about its existence.

Anyway, a FeinCMS page's location in a tree will be stored in the database, using a tree_id, level (0 being the top level, 1 the row beneath and so on), parent_id and lft and rght. Lft and rght are obviously the numbers left and right of the tree entry, following the MPTT principle. If you somehow want to upset the tree and FeinCMS with it, all you have to do is change one of the lft or rght numbers in the database.

This totally failed to bubble into my mind when I heard about a FeinCMS page that was not displaying. Confused, I tried to reproduce the bug, without result. After a while trying and fiddling, a colleague pointed to the MPTT principle and that FeinCMS uses it. After a little more fiddling, we found out that the tree was indeed a little 'borked', confusing FeinCMS so much that it eventually rendered a 404-page. Lucky for us, FeinCMS plugs two management commands into Django, to fix problems like this: rebuild_mptt ('Only use in emergencies') and rebuild_mptt_direct ('should only be used to repair damaged databases').

Now that the problem was fixed, all we needed to find was why the tree fell apart in the first place. The smart colleague I mentioned before, pointed at fixtures. I made an initial_data fixture to make sure some of the pages were already there for the customer. In the fixture, I tried to kind of build the tree (using tree_id, parent_id, lft, rght and level), to make sure the pages were ordered as I wanted them to be. However, I forgot that initial_data fixtures will be executed every time you run migrations. It turned out the customer had added a new page to the tree before we ran a migration for an update of the site, replacing the new rght and lft numbers with the ones from the fixture. Renaming the initial_data fixture to a more convenient name took care of this risk.

A lot was learned today.

Tuesday, May 31, 2011

Django Class-based Views

Since the release of Django 1.3, developers can choose to use class-based views in their web apps. Since the announcement of class-based views, there has been said a lot about them. As with all changes, there are pros and cons, people who are excited and people who are disappointed. I, and I guess a lot of people with me, are excited by the class-based views, but disappointed by the documentation Django gives with them. Time to try to clear things up.

What do I like about Django's class-based views?

Well, to start with: consistency. It might sound a bit lame, but I think it's a great thing that, following models and forms, views are now also part of the class-based club. I somehow always thought it was weird to write your forms and models in a class, but your views in a function. This weird feeling is now gone. Lucky me.

Secondly: consistency. Again? Yes, again, because you can now force your own code to be more consistent, by using subclassing. You can, for example, write your own superclass view template and let all your other views subclass it.

And last: Subclassing. I found it hard to find an example to make my point, but it might be obvious that subclassing is a good thing in general. If you have two slightly different views, subclassing reduces the amount of code that you will copy/paste and adapt a little, and thus reduces redundancy. Positivity all over the place!

What I don't like about Django's class-based views?

I have mentioned it before in this post: The documentation. Whenever you Google or Bing or whatever for Django class-based views, you will encounter a bunch of pages about Django's class-based generic views. It may be something personal, but I don't really like generic views. Whenever I use them, it feels like I am losing control over the website I am trying to create.

Moreover, the Django documentation seems to put a lot of custom logic, like what templates to use, in the URL config. Which is another thing that rings my developer alarms. The URL config, in my opinion, is a bunch of rules that tells my application what view should be called on what URL. Taking away some of the functionality of the views and putting it into the URL config just doesn't feel right.

So we don't want the class-based generic views, but ordinary class-based views. The way we used to have views in Django 1.2.x, but now with classes. So we Google and Bing and whatever along, to find any documentation covering ordinary class-based views. My Google skills are not something to brag about, but I could not find any information about non-generic class-based views.

So what do we do now?

I talked this over with one of my colleagues. He agreed with me about the lack of non-generic class-based views documentation and the fact that Django putting logic into the URL config is kind of ugly. He told me how he uses the class-based views in his projects and I decided to go with it, just because it made some sense to me.

In my current projects, all of my views are subclassing django.views.generic.View and optionally django.views.generic.base.TemplateResponseMixin. That way you can easily convert your former function-based views to new and cool class-based views. A code example to summarize and conclude.

class SomeFormView(TemplateResponseMixin, View):
    template_name = 'some_form.html'

    def get(self, request):
        form = SomeForm()

        return self.render_to_response({
            'form': form,
        })

    def post(self, request):
        form = SomeForm(request.POST)

        if form.is_valid():
            form.save()
            messages.success(request, 'Your form has been saved!')

        return self.render_to_response({
            'form': form,
        })


class AjaxThingView(View): 
    # Note that I don't subclass the TemplateResponseMixin here!

    def get(self, request):
        return HttpResponse(status=404)

    def post(self, request):
        id = request.POST.get('id')

        # Do something with the id
        return HttpResponse('some data')

Monday, April 4, 2011

Django howto: Non-conflicting slugs

A while ago I was testing a Django project at work. In the project we had a Django app called Groups. To create a group, you should point the browser to www.domain.tld/group/create/ and to view a group, you had to point your browser to www.domain.tld/group/<group_slug>/. Of course a group slug is unique, so we should never have any conflicts. That is, until I decided to create a group with the slug 'create'.

As expected, I was faced by the 'Create a new group'-page when I tried to view my pretty new group. When I swapped some URLs in de Groups app, everybody who tried to create a new group, was served the page of my beautiful group. Everything worked as expected, but still it was considered a bug. This was not the kind of functionality the customer was looking for. For the time being we just let it be (what are the chances the customer would create a group with the slug 'create'?), but the case kept whining in the back of my head.
Until today. I decided to tackle the problem. And of course, it was easier than I thought.

After entering some smart queries, Google told me that there is something called the Django test client. Django says: 'The test client is a Python class that acts as a dummy Web browser, allowing you to test your views and interact with your Django-powered application programmatically.' (source) In other words: The test client can be used to send a request to and get a response from your server. This could come in handy!

Long story short: I decided to create a SaveSlugField that uses the Django test client to check if a certain slug (or URL) already exists. If the slug already exists, it throws a standard validation error. Here's the code you've been craving for:

from django import forms
from django.test.client import Client


class SaveSlugField(forms.SlugField):

    def slug_url_exists(self, slug):
        """
        This function uses the Django test Client to determine whether or
        not a given URL (or slug) is already being used in this website.
        """

        c = Client()
        if not slug[0] == '/':
            slug = '/' + slug # Add a slash to create a valid URL

        response = c.get(slug)
        if response.status_code == 404:
            return False

        return True

    def clean(self, *args, **kwargs):
        data = super(SaveSlugField, self).clean(*args, **kwargs)

        if self.slug_url_exists(data):
            raise forms.ValidationError('This slug is not available')

        return data

I think the code is pretty self explaining. This SaveSlugField checks if the URL www.domain.tld/<slug>/ is still available. Of course you can use subclassing (like I subclassed SlugField) to check for your custom slugs/URLs like www.domain.tld/group/<group_slug>/.

If there are any better, prettier, easier or faster way to solve this problem, do not hesitate to let me know in the comments!

EDIT:
Thanks @ polichism. I was afraid I might be kind of confusing ;-)
For the sake of clarity: The URL config in the Group app probably looked a little like this:
url(r'^group/create/$', 'create', name = 'groups.create'),
url(r'^group/(?P<slug>[-\w]+)/$', 'view', name = 'groups.view'),

Wednesday, October 20, 2010

[UPDATED] Python: Don't touch the list you're iterating

If you are a developer that likes safe code, there's probably nothing in this post you don't know. However, if you have read my previous post, you already know that I'm a lazy developer who likes his programming languages to tell him he's a jerk. I stumbled across a Python feature that didn't tell me I am a jerk. Let's call it a quirk.

Enough for the poetry. The problem: I made a list and I wanted to remove objects from that list, when they met a certain condition. The thing I did, looked a little bit like the following snippet of code.

roll = [1, 2, 3, 4]
for segment in roll:
    if segment == 2:
        roll.remove(segment)
    else:
        print segment

When you try to run this snippet, it will print out '1, 4'. Indeed, '3' is missing. Without the code telling anybody you are a jerk. The list will eventually be [1, 3, 4], but things can be really messed up. What if we try to remove the segments that are equal to two as well as three? Python will just skip the third segment, due to an internal pointer, and the jerk in this story will end up really confused, with a list like [1, 3, 4], while he thought it would be [1, 4].

Of course you are now craving for the solution of this problem. And of course it is really simple. Just make a copy of the list you try to alter and iterate through that one. Then alter the list you want to alter. Do I make myself sound confusing again? This will probably enlighten you:

roll = [1, 2, 3, 4]
copy = list(roll) # list() makes a copy
for segment in copy:
    if segment == 2 or segment == 3:
        roll.remove(segment)

print roll

See, '[1, 4]'. Just the way you like it.

*** UPDATE ***

The example above works with removing a segment from a list. If you really want to mess things up, try appending segments. Every time you iterate through the list, you can add a segment. When you add a segment, the loop needs one more iteration. One more iteration, in which one more segment will be added and so on.

Try the following snippet AT YOUR OWN RISK!
roll = [1]
for segment in roll:
    roll.append(segment+1)
    print segment

If you want, leave a comment with the segment at which the program died.