Tuesday, October 25, 2011

Django with solr - Now you are true web-dev



Apache Solr is an extremely powerful, enterprise level search engine, and can be used to store billions of records. For anyone with experience in MySql, you will understand how query time starts to degrade after reaching around 1,000,000 rows for any given table. After doing tons of research to try to find an alternative method for a quick and reliable search database, I stumbled upon the Apache Solr Project. The general consensus about Apache Solr is that it’s lightning fast, and after using it for a recent project I will definitely agree to this.

So this should be great news for a web developer who is looking for such a solution. Just go to the Apache Solr website, download and install the software, and you’re set right? Wrong! To give you a fair warning, the integration of Apache Solr onto your web server is a complete project in itself. The reason it’s so difficult is because of the lack of quality information online, so I’d like to share my knowledge to all you Apache Solr Noobs so you don’t have to rip your hair out your skull. If I can save just one hair follicle from this guide, then I’ve done my job. For women with mustaches, you may not want to continue reading as you may want to lose some facial hair.




The Downloads

Before I get started, I should let everyone know that this guide is mainly for Windows XP users, although there is a slight variation to the steps for anyone using the new Windows 7.

1. Download Xampp For Windows, Basic Package. http://www.apachefriends.org/en/xampp-windows.html

2. Download Tomcat Add-on. Tomcat is a java server and because Solr is run on Java getting Tomcat is necessary.

3. Download Java JDK http://java.sun.com/javase/downloads/index.jsp

4. Download Apache Solr from one of the mirrors. I got version 1.4.0 but I believe any version will do. http://www.proxytracker.com/apache/lucene/solr/

5. Download the Solr PHP Client. http://code.google.com/p/solr-php-client/




The Installation

1. Install Xampp, and follow the instructions.

2. Install Tomcat, and follow the instructions.

3. Install the latest java JDk.

4. There should now be a folder called /xampp in your C Drive. Enter the xampp folder and find the ‘xampp-control’ application, and start it.



5. Place a check mark for the Svc for Apache, MySQL, and Tomcat. This is so you install these applications as windows services.



6. Click the ‘SCM’ button and you should get a Windows Service Window.



7. Find the Apache Tomcat Service, then Right click it and go to ‘Properties’. Here you will set the Startup Type to Automatic, and close the properties window. We want Tomcat to start every time Windows boots up.




8. Now highlight Apache Tomcat in the Services Window, and click the option to Stop the Service if it’s not already Stopped. Tomcat has to be disabled for the next few steps.



9. Extract Apache Solr, then go into the /dist folder. There should be a file called apache-solr-1.4.0.war, copy this file.



10. Now find a folder in C:/xampp/tomcat/webapps/ and copy the apache-solr-1.4.0.war file into this folder. Rename apache-solr-1.4.0.war to solr.war.



11. Go back to the extracted Apache Solr folder and go to /example/solr/ then copy these files.



12. Create a New directory in C:/xampp/ called /solr/. You will now paste the /example/solr/ files into this directory.



13. Now find C:/xampp/tomcat/bin/tomcat6w, click on the Java Tab, and copy the command “-Dsolr.solr.home=C:xamppsolr” into the Java Options section.



14. Now go back to the Windows Services Window, and start Apache Tomcat.

15. Open up a browser and type “http://localhost:8080/solr/admin/” into the browser to confirm a successful installation of Apache Solr. You should see the Apache Solr Administrative Screen, if you see a bunch of error codes then you messed up. You might want to consider uninstalling everything, then start over and follow directions more carefully next time.






Python libraries to access Solr

Python API

There is a simple client API as part of the Solr repository: http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/client/python/

Note: As of version 1.3, Solr no longer comes bundled with a Python client. The existing client was not sufficiently maintained or tested as development of Solr progressed, and committers felt that the code was not up to our usual high standards of release.

solrpy

solrpy is available at The Python Package Index so you should be able to:

easy_install solrpy

Or you can check out the source code and:

python setup.py install

PySolr

There is a independent "pysolr" project available ... http://code.google.com/p/pysolr/

And Python Solr, And enhanced version of pysolr that supports pagination and batch operations.

insol

Another independent Solr API, focused on easy of use in large scale production enviroments, clean and fast, still in development

http://github.com/mdomans/insol

sunburnt

Sunburnt is an actively-developed Solr library, both for inserting and querying documents. Its development has aimed particularly at making the Solr API accessible in a Pythonic style. Sunburnt is in active use on several internet-scale sites.

http://pypi.python.org/pypi/sunburnt

http://github.com/tow/sunburnt

Using Solr's Python output

Solr has an optional Python response format that extends its JSON output in the following ways to allow the response to be safely eval'd by Python's interpreter:

  • true and false changed to True and False
  • Python unicode strings used where needed
  • ASCII output (with unicode escapes) for less error-prone interoperability
  • newlines escaped
  • null changed to None

Here is a simple example of how one may query Solr using the Python response format:

from urllib2 import *
conn = urlopen('http://localhost:8983/solr/select?q=iPod&wt=python')
rsp = eval( conn.read() )

print "number of matches=", rsp['response']['numFound']

#print out the name field for each returned document
for doc in rsp['response']['docs']:
  print 'name field =', doc['name']

With Python 2.6 you can use the literal_eval function instead of eval. This only evaluates "safe" syntax for the built-in data types and not any executable code:

import ast
rsp = ast.literal_eval(conn.read())

Using normal JSON

Using eval is generally considered bad form and dangerous in Python. In theory if you trust the remote server it is okay, but if something goes wrong it means someone can run arbitrary code on your server (attacking eval is very easy).

It would be better to use a Python JSON library like simplejson. It would look like:

from urllib2 import *
import simplejson
conn = urlopen('http://localhost:8983/solr/select?q=iPod&wt=json')
rsp = simplejson.load(conn)
...

Safer, and as you can see, easy.


For Django developers ...... Use Django Application

Haystack for Django

Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Whoosh, Xapian, etc.) without having to modify your code.

http://docs.haystacksearch.org/dev/toc.html

Ubuntu users follow this link to install solr and access it through haystack

http://yuji.wordpress.com/2011/08/18/installing-solr-and-django-haystack-on-ubuntu-with-openjdk/

Wednesday, October 19, 2011

Comet with Django and Ajax push engine ( APE )

Recently I have implemented APE in my project . Before APE i was using continuous ajax call that is obviously very bad idea... in APE each user has its own pubid , channel , sessId . Channel is used for sending message to another user or vice-versa....

 Here....I just slapped together a really basic message posting app. Just a message with a name and a timestamp.


# Here's the model:
class Message(models.Model):
    msg = models.TextField('Message')
    timestamp = models.DateTimeField('Timestamp', auto_now_add=True)
    posted_by = models.CharField(max_length=50)
# And here's the view:
def show_messages(request):
    if request.method == 'POST':
        new_msg = Message(msg = request.POST['msg'],
                          posted_by = request.POST['posted_by'])
        new_msg.save()
    messages = Message.objects.all()
    return render_to_response('messages.html', {'messages': messages})
And here’s the template:

<style type='text/css'>
    div.message {
        background: #DDDDDD;
        margin: 10px;
        padding: 10px;
    }
</style>
<div id="current_messages">
    {% for message in messages %}
    <div class="message">
        {{ message.msg }}<br />
        Posted by: <strong>{{ message.posted_by }}</strong><br />
        on: {{ message.timestamp }}
    </div>
    {% empty %}
        <div class="message">
            No Messages
        </div>
    {% endfor %}
</div>
<div id="new_message">
    <form id="msgform" method="post">
    <h2>Post a new message</h2>
    Your Name: <input name="posted_by" type="text" /><br />
    Your Message:<br />
    <textarea cols="50" rows="10" name="msg"></textarea><br />
    <input type="submit" value="Submit Message" />
    </form>
</div>

Ok, so that’s the basic setup… Pretty boring, right? If Sue submits a message, Bob has to refresh his page to see it. So, we’re going to add APE into the mix to update Bob right away. Getting APE up running it pretty easy, just follow the instructions over at the APE Wiki. We’ll be using APE’s inlinepush server module. Just for good measure we’ll be tossing some jQuery in as well.
So, the basic idea is we need to connect to a APE “channel” which we’ll use to receive updated messages. We’ll use jQuery to send the messages. So let’s start by using jQuery to submit the form, with the following script:

function append_message(data) {
    fields = data[0].fields;
    message_str = fields.msg + '\nPosted by: <strong>'
                         + fields.posted_by + '</strong> on: '
                         + fields.timestamp;
    new_div = $('<div />').addClass('message').html(message_str);
    $('div#current_messages').append(new_div);
}
$('#msgform').submit(function() {
    $.post('/ajaxsubmit',
            {posted_by: $("input[name='posted_by']").val(),
               msg: $("textarea[name='msg']").val()},
            append_message, 'json');
     //For brevity, we're just going to assume this always works
     $("textarea[name='msg']").val('');
     return false;
});

And here are the updated views:

def show_messages(request):
    messages = Message.objects.all()
    return render_to_response('messages.html', {'messages': messages})
def ajaxsubmit(request):
    new_msg = Message(msg = request.POST['msg'],
                      posted_by = request.POST['posted_by'])
    new_msg.save()
    jsonified_msg = serializers.serialize("json", [new_msg])
    # Again, we're just going to assume this always works
    return HttpResponse(jsonified_msg, mimetype='application/javascript')

And so with that, we can now submit messages without reloading the page. Any other users, however, would need to refresh to see the new messages. That’s where APE comes in. Here’s the APE client code (along with the updated append_message function):

var client = new APE.Client();
client.load();
client.addEvent('load', function() {
    posted_by = prompt('Your name?');
    client.core.start({"name": posted_by});
    $("input[name='posted_by']").val(posted_by);
});
client.addEvent('ready', function() {
    //Once APE is ready, join the messages channel and wait for new messages
    client.core.join('messages');
    client.onRaw('postmsg', function(raw, pipe) {
        append_message(raw.data);
    });
});
function append_message(data) {
    message_str = data.msg + '\nPosted by: <strong>'
                         + data.posted_by + '</strong> on: '
                         + data.timestamp;
    new_div = $('<div>').addClass('message').html(message_str);
    $('div#current_messages').append(new_div);
}

And here’s the updated view to send new posts to APE:

def ajaxsubmit(request):
    new_msg = Message(msg = request.POST['msg'],
                      posted_by = request.POST['posted_by'])
    new_msg.save()
    # Again, we're just going to assume this always works
    cmd = [{'cmd': 'inlinepush',
            'params': {
                'password': settings.APE_PASSWORD,
                'raw': 'postmsg',
                'channel': 'messages',
                'data': {
                    'msg': new_msg.msg,
                    'posted_by': new_msg.posted_by,
                    'timestamp': new_msg.timestamp
                }
            }
    }]
    url = settings.APE_SERVER + urllib2.quote(json.dumps(cmd))
    response = urllib2.urlopen(url)
    # Updating the message is handled by APE, so just return an empty 200
    return HttpResponse()
You’ll notice I’ve added two settings to settings.py:
APE_PASSWORD = 'testpasswd'
APE_SERVER = 'http://ape-test.local:6969/?'

And that’s all there is to it! Like I said earlier, this is a very incomplete application, and a lot of the “boilerplate” work is left as an exercise for the reader. It should be more than enough to get you on your way though. Perhaps in a later post I’ll move the APE update from inside the view to a post_save signal on the model itself. That’s for another time though… UPDATE: Well, another time is today. Check out this post about adding a signal handler.

Reference : http://www.alittletothewright.com/index.php/2010/01/comet-with-django-and-ape/

If any question or error then feel free to comment here........before that i would like to write some problems/guidelines/alternative solution  that i had used in my project.....

1)  Ready event not working when page refresh :
-> yes.........to get it working you have to use session to store each user .........OR ........without session u can append random no to name when calling client.core.start that will load new user each time .

2)  Implement chat...
-> To implement chat u need to register new channel for each user ...when somebody wants to chat he/she will send message to that channel using pipe.send method and the target user will check for new msg using client.onRaw method.......

Pagination in Django is now simple with this application

Source code :Download