This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/osmocom-sdr@lists.osmocom.org/.
Bill Gaylord chibill110 at gmail.comI will be starting by defining by format then writing a converter to convert between text csv and the binary format. I will also try to write a stream based converter that you can pipe the output of rtl_power into the write the binary format directly. I figure this is more universal then trying to make a binary format in the application itself. On Thu, Jan 2, 2020 at 7:30 PM Hayati Ayguen <h_ayguen at web.de> wrote: > > Hi, > > i'd agree that having text encoding + compression is far from ideal. > > However, another aspect/goad might be following: > have the main data readable binary from gnuplot. > > see > http://gnuplot.sourceforge.net/docs_4.2/node103.html > > > kind regards, > Hayati > > > Am 02.01.2020 um 14:49 schrieb Müller, Marcus (CEL): > > Hi Abhishek, > > > > On Fri, 2019-12-27 at 20:11 +0530, Abhishek Goyal wrote: > >> In practice you will find that [text format+ compression] will be > >> fairly close to [binary format + compression] in final size. > > > > Is that so? Color me surprised! While certainly any dictionary-based > > compressor could find the bytes that make up the individual digits and > > compress them to an average of a little less than 4b, that'd still be > > worse than the 8b you need to represent any number 0-255, for example. > > And if your dictionary allows for variable-length words, like an LZ(W) > > kind of algorithm, the compression ratio should saturate pretty early. > > > > Now, I haven't worked with the specific text data coming out of > > rtl_power, so I'd be very interested in the results! > > Bill, could you compress a few of your textual rtl_power output files > > (using gzip --best, and maybe xz) for us and tell us the how many > > numbers were in the original files and how many bytes are the resulting > > files in length? > > > > (zstd: would be very interesting to have a detached dictionary, because > > I presume the dictionary overhead to be non-negligible with large > > numbers of smaller observation files) > > (BTW, tar is the worst format under the sun to compress many small > > files; it pads every file to 512B; of course, zeros compress nicely, > > but suddenly your shortest codeword is a useless padding symbol and > > that has a measurable compressed file size effect) > > > >> Compression obviously will reduce random access to data, so if your > >> intended use involves seeking randomly around in the data, things get > >> tricky. > > > > Indeed, that's what I'd have to bring forward: A "compressor" based on > > simply converting the tabular text data to binary format would be not > > so far away, worst case, from an actual entropy encoder, but allow for > > random seeks, AND be faster. I honestly don't see the downsides of > > that! > > > >> If the format is intended to be shared with other people, and/or > >> manage large collections of such data, either use hdf5[1] or look > >> into it for inspiration. > > > > Yep; or other formats. GNU Radio, for example, simply uses raw binary > > numbers packed end-to-end; there's the SigMF project which strives to > > provide metadata (sample format, acquisition time, and other > > parameters) in a separate file. It's JSON lying next to your data file. > > Whether or not that's useful to you... > > > >> If that sounds too complicated, then protobufs[2] might be another > >> option. Both hdf5 and protobufs benefit from having readily available > >> code for access to the data for later analysis, and from having > >> hundreds of man-years of bugfixes and portability fixes behind them. > > > > Yeah, but a protobuf that's mostly a buffer of ints really is only > > binary numbers right after each other, plus a header that you define > > yourself. It's a good idea to let some library like protobuf handle > > that, I agree! > > > >> Again, depending on the use case, another option might be to store > >> the data in a sqlite[3] database file, its an underrated option for > >> large amounts of data: here the binary conversion to and fro can be > >> handled by the sqlite tools themselves, and you have access to the > >> data in a fully random-access fashion. There are ways for sqlite to > >> do online compression of data as well[4], incase you find the > >> standard size reduction from going to binary isn't enough. > > > > Not quite sure how well sqlite handles compression of BLOBs, or are you > > suggesting you insert samples as values individually? > > > > By the way, I think "so much data it becomes a burden to my server" is > > actually not covered by what sqlite is designed to do. There might be > > more optimized databases for that. > > > >> Greg brought up endianness, then theres framing(indicating the start > > if each "row" of data, in case a few bytes got corrupted on disk, thus > > allowing the rest to still be recovered) > > > > Which should be no problem, seeing that rows are fixed-length, > > > >> versioning (if you change the data format, how do you have the > > reading code still remain able to read the older format) > > > > Important point, imho, but this feels like a one-off format, so really, > > might be a bit of overengineering. Anyways, never hurts to simply have > > a header field that says "version". Do that! > > > >> debugging (its very hard to be 100% sure that a binary data you > > intended to write, typically there will be no error messages from the > > code and no crashes - just wrong and misleading results due to > > miswritten/misread data), etc. > > > > In my experience, writing textual data is way harder to keep > > consistent, due to the non-fixed amount of bytes required per word. > > > > Best regards, > > Marcus > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.osmocom.org/pipermail/osmocom-sdr/attachments/20200103/4e3369a0/attachment.htm>