c++ - Efficiently storing DNA base-pair data in RAM? -
very related question: most efficient way store big dna sequence? , one: declaring new data type dna
i'd find way efficiently store long sets of characters actg without wasting entire byte each value, when should require 2 bits. however, don't see descriptions in responses regarding how go storing 2-bit data in c++, or java or language matter, although figure c++ should ideal language it.
so question this, syntax create conveniently usable 2-bit data type? assume sort of structure going need made fill byte-sized(lol) chunks of data, i'm not certain.
i'd interested in knowing if such thing possible in other languages well, such javascript or perl, how go in c++.
example code appreciated, thank you.
i suggest encode data in std::bitset
, store bitsets in std::vector
. can code dna pair in bitset , waste 4 bits per element in vector or code 2 dna pairs in each bitset , have perfect storage.
Comments
Post a Comment