Bio-Python

"Ramblings on computational chemistry, in silico experiments and programming in python 3.x"

February 28, 2013

Python 3: Filter function and list comprehensions

Many of you who are familiar with python 2 and have switched to python 3 may already know that functions (filter, map, range etc.) that returned a list no longer do so. In python 3, these functions return an iterator.
In case you wish to use filter, map etc. and get a list in python 3, you have two techniques at your disposal, namely:
  1. Wrap the iterator around list( ) 
  2. Use list comprehensions instead

Wrapping the iterator around list

The following snippets returns all even numbers upto but not including 10.

Using list comprehension technique

We can modify the above program to do the same task but using more familiar list comprehension method.

This also gives us all even number upto but not including 10.

Leave a comment explaining how there two coding styles differ in efficiency and readability?

February 26, 2013

Checking 'nan' and 'inf', pythonic way

How to check for Not a Number (NaN) or Infinity (Inf) in python?

NaN:

In python you can declare 'nan' as a float.

Inf:

When you try print(1E650), it returns inf. In python, variable x can be checked for 'inf' using math.isinf(x)

For more on 'nan' and 'inf' refer PEP 754.

February 25, 2013

Preference of language in scientific computing - why we speak python more often now


"Exploratory programming" is the most appropriate word. It aptly describes scientists (of non-Computer Science background) who do computation, but they never know beforehand which implementation will lead to an answer. When Fortran was language of choice in scientific computing data generation was the central focus. With advancement of human knowledge in various fields of science, data processing became equally if not more important. The role of biologists and chemists doing computation has evolved over years until today when they (researchers) are typically required to perform numerical calculations on one hand, as well as carry out diverse tasks like administering databases, procuring information from them, managing workflows etc. Therefore implementing python in such scenario is most viable option.

Another important reason for popularity of python was explained by Eddie Cao (Sr. Informatics Architect at Novartis) during US-PyCon 2012 "... For us python has lot of advantages, at the top of my mind is the fact that python is very approachable. We support scientists that are not computer scientists by training. They are chemists and biologists, but they can actually look at python code and make so many changes."

The legacy code which has been written in other languages like Fortran and MATLAB+ etcetra is time tested. Python works as a glue by perform mapping of the data structures in the classical (legacy) codes. For example, F2py which makes it possible to wrap complete Fortran modules.

Much of the useful code in science and engineering is already written in languages like Fortran and MATLAB+.

Some python based projects :

Though this list can be very long but here are a few projects that are popular and/or I find of some personal interest.

Project Description Website

NumPy

provides support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays http://numpy.org/

SciPy

Library for mathematics, science, and engineering. It in turn depends of numpy which provides array manipulation libraries. http://scipy.org/

iPython

Interactive Python consists of three interfaces:

  1. Shell based
  2. QtConsole and
  3. Web based Notebook
http://ipython.org/

Matplotlib

2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms http://matplotlib.org/

MayaVi

3D plotting library http://mayavi.sourceforge.net/

Pandas

library providing high-performance, easy-to-use data structures and data analysis tools http://pandas.pydata.org/

PyQuante++

suite of programs for developing quantum chemistry methods. written in python, numerical python and C http://pyquante.sourceforge.net/

MMTK

a.k.a. Molecular Modelling Toolkit http://dirac.cnrs-orleans.fr/MMTK/

BioPython

set of tools for biological computation http://biopython.org/



+ MATLAB is a registered trademark of The MathWorks, Inc.
++ Rate limiting steps in PyQuante are written in C. If sometime in future, you expect to see a complete Python based Quantum Chemistry program, you must consider contributing some money towards fund raising for PyPy project. I really do not know if that is an exaggeration, but developers at PyPy are known for doing some crazy stuff (including development of a faster although restricted form of Python!).

February 17, 2013

Python-3 GUI. What developer options exist?

After reading article about "file reading and copying", few readers via comments on g+ and email asked to write about GUI development in python 3. Frankly, I have never been GUI fan but for the sake of requests I explored some development options available. My preference is summarized below:

For python 2.7 : wxPython > pyQt > tkinter
For python 3.x : PyQt > tkinter (no WxPython support while the time of writing this)

There are more options available which I have not explored yet. (An exhaustive list can be found here: GUI programming.)

First and foremost, I'd like to say that wxPython is one of the best project as per my literature survey (notably, GvR  quoted it to be most mature GUI project) but since it has not been ported into python 3.x, I will talk only about PyQt4. WxPython is under very active development and hopefully soon it will be released for py3k and its progenitors.

Why not show love for tkinter?

Yes, obviously it is native in python, stable and simple but what I don't like about tkinter is that it often tends to have a non-native look. Tkinter windows (or frames if u like to call it phuff!!) often look strikingly different from current theme. Secondly, unlike PyQt and WxPython, tkinter does not have a very active user community.

IDLE; python's own IDE is designed using tk. On my arch linux running XFCE desktop environment its frames appear noticeably different from that of other applications.

Importing ttk which is one way to address issue of incongruous appearance of tkinter. (but be aware, ttk alters the methods of tk widgets so don't be surprised if you cannot call select/deselect on a checkbutton with ttk.) Call me biased but I find pyQt and wxPython more tweak-able than tkinter.

Important note :

PyQt comes in two different license versions:
  1. GPL (code must be made available)
  2. Commercial license (costs money)
There's also PySide which is mostly but not completely compatible with PyQt and comes with less restrictive license (LGPL) that allows both open source and proprietary software development. This difference in license might be of great significance to those who seek to develop commercial application.
    As of now, I only found PyQt4 as a mature project that is ready for python 3. If you are using other python GUI development tool, tell how it is superior than the ones mentioned here (tk, wxpython and pyqt). Also mention whether it is available for python 3. In coming weeks I will briefly summarize the PyQt4 modules and will show how to develop simple frames.

    February 14, 2013

    Proposal for "Bioinformatics & Computational Chemistry" on Stack Exchange Network

    Stack Exchange Q&A site proposal: Bioinformatics & Computation Chemistry

    I cordially invite you to help create a "Bioinformatics & Computational Chemistry" community on Stack Exchange. Stack Exchange is a fast-growing network of question and answer sites. Participating in the Q&A system, you can:

    • Ask questions in any area within or across Computational chemistry\biology disciplines
    • Find if others have questions that you have encountered in your research or professional career and answer them
    • Vote for questions and answers
    • Build reputation as a Bioinformatics or Computational Chemistry expert
    • Help build a knowledge-base for computational chemistry\biology 
    Let us create a visible, sustainable and open community of chemistry and biology using StackExchange. 


    Why StackExchange?

    It is easy to put up a Q&A system, but it is much more difficult to develop a successful sustainable community. StackExchange forces a community to build, and prove itself by the commitment of users to the StackExchange Communities. This is done through it's Area51 process. 

    How to Participate?

    To create a Bioinformatics and Computational Chemistry Q&A community hosted by StackExchange, we need to demonstrate participation, and generate a sustained flow of Q&A exchanges. The intial steps are:

    1. Visit : http://area51.stackexchange.com
    2. Create an account : click here (opens in new tab)
    3. Go to the "Bioinformatics and Computational Chemistry" proposal: click here
    4. Click "Follow It!"
    5. Ask example questions. Your reputation grows as your questions and comments are voted up by other users. Discuss whether questions posted by others is suitable for the community.
    6. Vote for questions. You help others earn a reputation..


    Stages:

    Definition ===> Commit ====> Beta ====> StackExchange site
    (Currently this community is in definition phase.)


    I hope you find "Bioinformatics and Computational Chemistry" community useful and it achieves the "COMMIT" status soon.


    CRITERIA FOR THE “BETA” STATUS: 200 participants; 100 participants with 100+ reputation points on other StackExchange sites

    CRITERIA FOR STACKEXCHANGE SITE: 15 questions a day; 90% questions answered; hundreds of users; more than one answer per question; 1500 visits a day

    www.hypersmash.com

    February 10, 2013

    Reading a file and how to make a copy of a file in Python 3.x

    Today, I'm going to write about ways of reading contents of a file using python 3. There are two common ways to do this. A not so good way and a better way. Further, we will also see how to copy contents of one file into other file.

    We will work with a file named "acetylene.xyz". The contents of this file is as follows:

    Reading a file METHOD 1: Not so good way

    Using try/ finally one can read as follows:

    In the above code, we called inbuilt function open( ) which takes filename as a necessary argument. You can also provide 'r''w''a' i.e. read write append etc. as optional arguements. The function, open( ) also takes encoding ('UTF8''ASCII' etc.) as an optional argument.When you don't mention the encoding, python uses locale.getpreferredencoding( ) to determine your environment's default encoding.
    Executing the above piece of codes indeed gives the expected output.

    Reading a file METHOD 2: A better way

    Here we will use the with statement which was introduced in python 2.5 as a replacement for try/finally (see PEP 343). We can read the same file using with statement as follows:

    The above code gives the same output as method 1:

    In Method 2, the file is open when you are within the code block of with statement. The moment you leave this code block, python closes the file for you by calling a_file.close( ). You don't have to do it yourself as in method 1.
    Now you'd be wondering why I say that method 2 is better than method 1. Mark Pilgrim in his book Dive Into Python 3 explains as follows:
    "Stream objects have an explicit close( ) method, but what happens if your code has a bug and crashes before you call close( )? That file could theoretically stay open for much longer than necessary. While you’re debugging on your local computer, that’s not a big deal. On a production server, maybe it is."

    How to copy contents of one file into other file?


    Using the above code, the file "acetylene.xyz" is copied into a backup directory. The name of backup file will also be acetylene.xyz (because we did not mention file name, backup was an already existing directory).

    Acknowledgement

    Special thanks to +wesley chun (author, core python programming) for his valuable suggestions on improving this post.