Tuesday, February 22, 2011

Calling BEDtools from R

BEDtools suite provides command-line functionality when dealing with genomic coordinate based operations, such as overlapping bed files or getting coverage of a bed file over a genome (similar, not exactly same, functionality in R is provided by IRanges package in bioconductor). If you have the BEDtools suite installed and it is in your path, you can easily call BEDtools executables from R using the "system()" command. See BEDtools website to learn about BEDtools installation.

The function pasted below calls BEDtools executables on two data-frames. Both data-frames are structured as bed files. The function first writes out the data-frames as temporary files and calls the BEDtools programs on those temporary files and writes the output to another temporary file. In the end the function reads in the resulting file as a data frame and deletes the temporary files. I guess you can do this more elegantly with pipe() command in R, but that is left as an exercise to you: )


bedTools.2in<-function(functionstring="bedIntersect",bed1,bed2,opt.string="")
{
  #create temp files
  a.file=tempfile()
  b.file=tempfile()
  out   =tempfile()
  options(scipen =99) # not to use scientific notation when writing out
 
  #write bed formatted dataframes to tempfile
  write.table(bed1,file=a.file,quote=F,sep="\t",col.names=F,row.names=F)
  write.table(bed2,file=b.file,quote=F,sep="\t",col.names=F,row.names=F)
 
  # create the command string and call the command using system()
  command=paste(functionstring,"-a",a.file,"-b",b.file,opt.string,">",out,sep=" ")
  cat(command,"\n")
  try(system(command))
 
  res=read.table(out,header=F)
  unlink(a.file);unlink(b.file);unlink(out)
  return(res)
}
 
>bed1
       V1    V2      V3       V4                   V5    V6
1     chr1 67051161 67052451 ENST00000371026       1      -
2     chr1 67060631 67060788 ENST00000371026       2      -
3     chr1 67065090 67065317 ENST00000371026       3      -
4     chr1 67066082 67066181 ENST00000371026       4      -
5     chr1 67071855 67071977 ENST00000371026       5      -
6     chr1 67072261 67072419 ENST00000371026       6      -
 
>bedTools.2in("bedIntersect",bed1,bed2) # equivalent to "bedIntersect -a bed1 -b bed2" on terminal

5 comments:

  1. Hi, I use a SGE cluster to run BEDtools. I have to source the suite every time I run any of the BEDtools commands. In teh cluster, I want to run a R batch job wherein I want to call BEDtools. How do I get teh BEDtools suite in my path from R to run the function you suggest.
    Thanks for your help.
    Hari

    ReplyDelete
  2. Try giving the full path to the executable as the function string executable as in functionstring="/home/user/bin/bedIntersect"


    ReplyDelete
  3. Hi,
    is there a way to do a similar thing in R with closestBed and the -d option?
    Thanks for your help!

    ReplyDelete
  4. Thanks Altuna, very useful. I created mergeBed function for R, saved at github: https://github.com/zx8754/R_UDF/blob/master/udf_BEDTools_bmerge.R

    ReplyDelete