But lately, I’ve shifted to projects that focus on machine learning. I’m not a researcher, so I don’t focus on developing models and graphs, but over the past few years I’ve debugged, ported and improved a lot of the deep learning research code created here. My favorite framework is TensorFlow, and I spent most time with it, but I’ve also used PyTorch. TensorFlow’s graph based structure forces more discipline, and makes reasoning much easier than the python spaghetti code I’ve seen from torch users. The biggest NN project I’ve participated in so far was Denoising with Kernel Prediction. Implemented in TensorFlow, and using custom CUDA code for better performance, this denoiser surpasses anything before.
I still like C++, it is one of the most flexible and pragmatic languages that are widely used. But for deep learning, Python is the standard, and inertia ensures this will remain the case for many years. Which is unfortunate, the lack of type safety and static checking is a big annoyance, especially in non-trivial code bases as the ones I work in.
Using C++ for TensorFlow is possible but only done for very specific subset of tasks, such as embedding into other programs, or writing custom Ops with CPU or CUDA.
<img />
tag. While Safari and other WebKit browsers support this usage, Firefox does not. Instead, SVG can be embedded directly into the source code of an XHTML file.
The above image is part of the source code of this page. It is included by a TextPattern plugin, and only slightly processed. Processing is needed to remove the <?xml />
header and to insert a viewBox
attribute to allow scaling of the image with CSS. With that done, Firefox displays and scales the image nicely. Safari on the other hand causes trouble, it does not correctly infer the viewport height from the width and the aspect ratio.
To be allowed to embed SVG data, the mime type of the document has to be application/xhtml+xml
or similar. This has to be changed for TextPattern by editing the header()
call in publish.php. The plugin code itself is rather simple. Download a version ready to be pasted into TextPattern (Licensed under the MIT). Below the sourcecode.
svg_inline.php [1.74 kB]
TextPattern plugins are php functions that take two arguments: an array containing the tag attributes, and the contents of the tag element. All this plugin function does is to read the specified svg file and return the filtered source. A simple <txp:svg_inline src="imagepath" />
results in a nicely embedded SVG.
For further reading, see Paul Bourke’s Page on stereo pair creation.
Cameras in OpenGL are defined by filling the modelview matrix and the projection matrix with values. The modelview matrix defines the position of the camera relative to the origin or the object space, the projection matrix defines how coordinates in space are mapped to screen.
The projection matrix can be chosen freely, but normally two basic types of cameras are used: Orthographic and Perspective. Perspective cameras create projections very similar to how the human eye sees the world, objects appear smaller the further they are from the camera. Orthographic cameras project objects preserving parallel lines and their proportions. It is mostly used in technical drawings.
A simple method to use perspective cameras to create stereoscopic footage is to converged their viewing axis. With hardware cameras, this is often used for macro recordings or recordings in closed rooms. The advantage is that the parallax plane is determined when recording, so post-processing needs are low. In addition, the cameras do not have to be as close together as in the next method. The biggest drawback is that the left and right sides of the image do not overlap and have to be cut away or ignored, and that the divergence behind the parallax plane is very strong and can easily lead to unfuseable content. This method should be avoided when ever possible.
A better method is to use perspective cameras with parallel axis. It requires the cameras to be relatively close together and well aligned, both of which is no problem to do in software. Unlike converged cameras, the maximal divergence at infinity is fixed, so even recordings containing far objects can work. The zero parallax plane lies at infinity. It can be moved by creating asymmetric view frustums, effectively horizontally moving both images.
For special visualizations, parallel cameras with converged axis can be used. And similar as with perspective converged cameras, extreme caution has to be taken to not create strongly diverging images. This method should only be used to show objects that are very close to the parallax plane.
As part of ExaminationRoom, I implemented a flexible camera class. The source and header can be downloaded and used relatively freely. As all of my code on this page, they are licensed under the GPL and MIT licenses. This class is not meant to be used directly in an other project since a lot of code is specific to ER, but I am sure the core can be of use as example.
In my implementation, camera positions are defined by their position, their viewing direction, their up-vector and their separation (distance between the cameras). The projection is influenced by the field-of-view, the distance to the zero-parallax plane (the plane where separation of corresponding points is zero) and of course the type of the projection.
camera.h [6.01 kB]
The core of the class is the creation of the matrixes. The call to glFrustum
sets the projection matrix, the modelview matrix is created with the utility method gluLookAt
. The separation between the cameras has to be considered for both. The camera uses vertical field-of-view, so that the height of the image does not change between standard and widescreen viewport aspect ratios.
camera.cpp [9.43 kB]
The perspective projection is used in most places. For ExaminationRoom, one of the feature requests was the ability to disable selected depth cues. A very strong cue is size relative to the environment. To disable this cue, parallel projection with converged cameras as described above is used instead. The values for the projection matrix were chosen so that the objects at the zero-parallax plane would not change their size when switching between the projection types. The projection matrix is derived from the normal orthographic projection created by OpenGL’s glOrtho
by shearing it.
camera.cpp [9.43 kB]
Hopefully this is useful to someone :)
]]>This StringWriter class reduces the overhead by aggregating string concatenations in a table and executing them when requested. It was originally designed to serve as an efficient drop-in replacement for files as created by io.open, but it can also be used standalone.
The class itself is built with a protected shared metatable and state inside a table. The state itself is not protected (it would be possible by using individual metatables or an internal database in a weak table, but this is more elegant). The metatable contains entries to redirect reads to the method table, redirect new writes to nothing and prevent changing or reading the metatable. The concatenation operator is also overloaded, but since it has value semantic, and is not allowed to change the object itself, the implementation is less efficient than StringWriter:write(). Converting a StringWriter with tostring() gives the contained string, equivalent to StringWriter:get().
stringwriter.lua [4.77 kB]
The method table itself contains all methods the StringWriter supports. It was modeled after the file class, so many methods are placeholders that do nothing. The methods that are supported are seeking and writing. Seeking simply sets an internal position value. Writing in the context of files means overwriting and extending. When the position is at the end, the contents that are to be written can simply be appended to the contents table. Otherwise, the string has to be baked, split, and recomposed.
stringwriter.lua [4.77 kB]
StringWriter instances are created by a factory method. It initializes the state and sets the metatable.
stringwriter.lua [4.77 kB]
I hope this code is useful for someone, use it as you wish, it is licensed under the MIT license.
]]>The persistence code here requires nothing but lua’s standard io.open for reading and writing files. It can handle loops, multiple references to the same table in both keys and values, and most standard value types.
Not supported are userdata, threads and many types of functions. Exporting simple lua functions works, but the exported byte code is not portable. The result from the export is itself lua code, it can be executed and returns data structures equivalent to those that were exported.
The core for the export is a simple recursion with a dispatcher method and writers for all types. When unsupported types are encountered, nil is written. This can cause problems on import when those unsupported values are used as table keys, but in most cases it is more desirable than to fail the export.
persistence.lua [5.50 kB]
To be able to export tables that are referenced several times (be it a cycle in the data structure, or just one that is inserted several times), the structures that are to be written are examined first and the numbers or references to each table are counted.
All tables that have multiple references to them are created at the start in the export file before they are filled with content. This is required, since they could contain themselves or other multi-ref tables.
After all those temporary tables are created, they are filled with content. The writer for tables uses a lookup table for multi-ref tables, instead of creating the table constructor for them, they are assigned from the table created at the start. Last but not least, the passed arguments themselves are created in the same way.
persistence.lua [5.50 kB]
Loading the exported data is simple, but the provided method performs some error checking.
persistence.lua [5.50 kB]
I hope this code is useful for someone, use it as you wish, it is licensed under the MIT license.
]]>