Summary: Attached is a diff that allows Bio::FlatFileIndex to access BDB flatfile databases created by BioPerl’s Bio::DB::Flat. I have not changed the way BioRuby creates its databases, so this likely breaks access to BioRuby-created flatfile indices.
Background: I have some BDB (Berkeley DB) flatfile databases that were created with BioPerl’s Bio::DB::Flat that I’d like to read with BioRuby’s Bio::FlatFileIndex now that I’ve converted to Ruby. They both ostensibly use the OBDA format.
Problem: Bio::DB::Flat creates two files: config.dat, which stores the DB type, flatfile format, and namespace info (in this case, it’s called ACC), and a list of flatfiles indexed; and key_ACC, the actual BDB containing the ACC namespace index. Bio::FlatFileIndex reads config.dat for the DB type, but ignores the rest of its contents. Instead, it looks for two BDB files called “config” and “fileids”, as well as reading the “config.dat” file that Bio::DB::Flat creates.
As well, it returns sequences shifted one character to the right (there is a ‘>’ from the next sequence in the file, but none marking the header line of the sequence returned).
Solution: I’ve hacked it up so that it works for me. If anyone else is having this problem, the diff from my changes is attached below. Sample usage:
Bio::FlatFileIndex.open('/path/to/the/database/directory') do |db|
p = db.search("SPAC11H11.06") # My favourite pombe gene!
end
Now I just have to figure out what to do with the Bio::FlatFileIndex::Results mess that is returned…
Edit: Upon further investigation, it appears that BioRuby was doing things to spec, and BioPerl is doing the weird stuff. Too bad I still need to access BioPerl-created flatfile databases from Ruby.
Diff:
Index: bioruby/lib/bio/io/flatfile/index.rb
===================================================================
RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v
retrieving revision 1.19
diff -r1.19 index.rb
561c561
seek(pos, IO::SEEK_SET)
---
> seek(pos-1, IO::SEEK_SET)
1147,1148c1147,1148
@config = BDBwrapper.new(@dbname, 'config')
@bdb_fileids = BDBwrapper.new(@dbname, 'fileids')
---
> @config = hash.reject{|k,v| k.include?("fileid_") }
> @bdb_fileids = hash.reject{|k,v| !k.include?("fileid_") }
1196,1199d1195
@config.close
@config.open(*bdbarg)
@bdb_fileids.close
@bdb_fileids.open(*bdbarg)
1229,1232d1224
if @bdb then
@config.close
@bdb_fileids.close
end
1287c1279
@fileids = FileIDs.new('', @bdb_fileids)
---
> @fileids = FileIDs.new('fileid_', @bdb_fileids)