Monday, May 14, 2012

Fighting Author Existence Failure

I haven't been around much for the last 14 months. It'll be another 2-4 months, at the least, before I return to active duty. However, I can say with reasonable confidence I've avoided imminent existence failure. Expect blogging and work on Ix to resume whenever.

Monday, December 20, 2010

A new site for Ix

The Ix Language now has its own site, at http://danarmak.github.com/Ix/ . It contains the design docs, laid out with a TOC and allowing comments on every page.

It also contains a new blog just for Ix. Anyone who's interested, please subscribe there. I won't be posting more updates about Ix on this blog.

Monday, December 13, 2010

Ix memory model, objects and functions

Warning: this is superseded! Please visit the Ix site instead.

The information in this and future posts is likely to change many times as the Ix design matures. The purpose of these posts, apart from documenting my ideas, is to solicit comments, which hopefully will lead to improvements.

Significant changes in design will merit new posts; otherwise, old posts will be updated.

This post deals with the Ix memory model and the nature of objects, functions, and tasks. It introduces a lot of information rather quickly, as a preparation for future posts.

This post doesn't discuss syntax yet, just semantics.

Objects and functions

Everything in Ix is both an object and a function.

All objects support these operations, and no others:
  1. listattr - list attributes (names + metadata).
  2. getattr, setattr - get, set attribute value by name.
  3. call - call the object, as a function.
An object is, in theory, opaque; it may implement these operations in any way it chooses. In practice, most objects will be defined in the ordinary way - with a statically known, immutable list of attributes, enabling code completion, type inference, and efficient implementation.

A function is an object that implements the call() method and may have a few attributes provided by the runtime, such as a name. Objects not intended to be used as functions will not support the call() operation, but an object with custom attributes that also implements call() would be indistinguishable from a function some attributes added.

Tasks

A running Ix program consists of one or more tasks, in the .Net meaning of the word: bits of code that can be run on native threads. Tasks are extremely lightweight in terms of the memory and time used to create a task. They are at least reasonablly lightweight in terms of switching (scheduling).

A task holds references to one or more pipes. Pipes are used to send and receive objects between tasks. A task can have pipes connecting it with any child tasks it creates and with its parent task. References to other pipes can also be passed through pipes.
A task cannot influence the outside world or be observed by it except through pipes (but see below for IO, which needs special handling).

Mutability

The runtime maintains a mutability flag for each object. An immutable object can only refer to other immutable objects, and cannot be changed in any way. A type (of a name) can also be mutable or immutable: this simply means it does or does not include method calls that mutate the object (e.g., setters), similar to const variable types in C++.

Immutable objects can be shared between tasks freely (by passing them through pipes, or at task creation). Mutable objects must always be owned by exactly one task.

Changing mutability

A mutable object can be 'frozen' to become immutable. The owning task should prove to the compiler/runtime that no references to the object remain assigned to variables whose type is mutable. If the operation is forced without proving this, then future attempts to mutate the object will fail.

This is not the same as voluntarily restricting the type of a reference (name) from mutable to immutable - that would be a simple upcast. Mutability is a property of the object separate from the mutability of the type of a reference to it.

An immutable object can be 'unfrozen' to become mutable. The task asking to unfreeze it should prove to the compiler/runtime that no other task has a reference to the object. Otherwise, a runtime check must be performed for such references before making the object mutable and owned by the requesting task. (If no implementation for proving such things is implemented, then the runtime check will always be performed.)

Sending objects through pipes

An immutable object can be sent through a pipe by reference; this is very cheap.

A mutable object can be sent through a pipe by reference by transferring ownership of the object to the receiving task. The sending task should prove to the compiler/runtime that it is not keeping a reference to the object being sent. If it does not prove this and forces the operation, any future access to this object by the sending task will fail.

Cloning

Any object can be cloned (except for IO objects, see below). The language is aware of an object's structure and can always clone it directly; unlike in other languages, objects are clonable by default. An object may provide a method to modify the newly created clone (useful if an object e.g. manages unique IDs), but if at all possible, this method should not make the operation fail.

(I have considered outright forbidding the method from failing, but this seems on balance counterproductive.)

Ix supports shallow, deep, and in-between (user-visitor-controlled) cloning.

When creating a deep clone, it can be immediately made mutable or immutable as required, no matter what the mutability state of the original object.

Ix supports COW (copy on write) and can use it for cloning, among other uses.

IO objects and tasks

The fundamental reason IO tasks must be tracked and managed is that we cannot generally implement transaction rollback for external IO, and transactions are a core feature of Ix.

Any task that communicates with things outside the process must be known to the runtime as an IO task.

Certain objects (probably only special ones provided by the runtime) which initiate IO - such as objects for opening files or sockets - are marked as IO objects. IO objects are always mutable, and any task that owns one is an IO task.

In well-designed Ix programs, IO should always be separated into its own tasks, which should be kept as small as possible. Minimizing 'IO contamination' will ensure that the maximum amount of code will still receive the benefits of transactionality.

Transactions

Ix uses COW (copy on write) extensively to provide transactions. This functionality may or may not become an STM (software transactional memory) mechanism; I need to learn more about STM implementations to grok fully how other people do this.

This part still needs to be fleshed out. To fully describe transactional behavior, I also need to finish writing the spec for method visibility, which in turn will require the spec for the typing system, which might turn out to be rather long. So this is a convenient point to stop for tonight :-)

More tomorrow!

Ix: assumptions and limitations

Warning: this is superseded! Please visit the Ix site instead.

Here are some fundamental assumptions, with rationale.

Different languages fit different tasks and spaces. No language is truly universal, and it's best that Ix doesn't try to be universal either... At least not in v0.1.

These assumptions, then, can be given as sufficient reasons not to include a feature. (I still have to do a post on Ix goals and targets.)

  1. Ix is imperative.
    Functional features are easier to add to an imperative language than vice versa. A pure functional algorithm can be written naturally in an imperative language with good support for functional programming, but an imperative algorithm is not at home in a functional language. And there are many imperative algorithms and data structures and programmers out there.
  2. Ix does not promise to closely match the type system of any other language.
    While Ix should collect existing good ideas, it should also be bold in discarding existing methods if we can do better. Ix can include good FFIs, but they will probably have some performance penalty. Ix will not be able, at first, to trivially expose Ix objects to in-process code written in other languages.
  3. Ix will attempt not to bind itself too tightly to any one runtime or platform (e.g. VM).
    New compiler backends will be welcome. Highly platform-specific features will be placed in platform-specific libraries rather than the standard APIs. Features that work on most but not all platforms will be placed in the main library, and will indicate to the user if they are unsupported. (This is the Python approach to the subject.)
  4. Ix will not sacrifice usability for performance.
    Our performance goal is being able to replace the 95% of software that uses 5% of CPU time. In practice, good library and application design can do wonders for performance in any domain that does not have to run time-consuming calculations.
  5. Ix will tend to use 'mainstream' syntax for mainstream features unless there is a cost to usability.
    The trivial things everyone is used to should not change without good reason: ';' for statement termination, quote marks for string literals, '.' for attribute access, {} for blocks of code, () for function invocation, <> or [] for metadata.
    Ix may introduce indent-delimited syntax in the future, but not in the very first versions. Such an addition would in any case not change the semantics of existing code, so it can be deferred. (Rationale: it's a pain to spec and code in the parser. Yes we know how to do it. That doesn't mean we have to.)
  6. The Ix design will rely a lot on generating custom code at runtime.
    Any target platform must support that, ideally with the same performance as precompiled code (by using JIT compilation to begin with, or by running an offline compiler and loading the resulting module).

Ix: it's time to get this show on the road

Please visit the new Ix site!


It has been a long road, but I'm finally ready to publicly announce Ix: a new programming language.

Ix does not exist yet. It was previously called Synth, and it did not exist then, either. I am starting to design and develop it now.

It will be a new and wonderful language, truly, with kittens in it, and the only way it could possibly fail is if I give up on it. I'm taking measures to prevent that from happening.

Future posts here will explain why I think the world needs another language, what will be different and new and exciting about it, and lots and lots of technical details. If all goes well, I will eventually set up a dedicated site for Ix, and move these posts to a wiki or something.

I welcome readers one and all - anyone who enjoys PLD (programming language design), or programming in general. I sincerely hope to have enough meat in future posts to interest people, even if they never use Ix itself.

Let the coding begin!

Tuesday, January 5, 2010

Datahand halves can use extender cable

A quick tip for fellow Datahand owners: the cable connecting the two halves is a standard 15-pin serial cable. I used a standard 3 meter extender cable and it works great. Just in case you're wondering, like I was before buying.

Saturday, January 2, 2010

How to enable DRI + XVideo with several video cards ("vga arb" problem)

I hope this post comes in handy for someone searching for info.

I have a Radeon HD4850 card, which works great with the free radeon (aka xf86-video-ati) driver - which is to say DRI and XVideo work great, and that's all I need. (I do my gaming in Windows.)

Recently I bought a second 4850 card and set them up in dual Crossfire mode. DRI and XV stopped working, saying in the xorg log that DRI was not supported with "vga arb" (i.e. vga arbitrartion between several video cards). I only wanted to use one card - no crossfire - but the mere presence of another card prevented DRI from working.

Some Googling revealed that this is a known condition, which holds for DRI with any two cards with any drivers (even when only one card is active). It's not going to fixed soon, apparently, and presumably when it *does* get fixed it'll be for DRI2 only.

The solution? Disable the second card completely, on the PCI bus level. First I find out the PCI bus ID of the cards:


[danarmak@planet data]$ lspci|grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4850]
02:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4850]
Note that the main bus IDs are 1 and 2, because these are PCIe cards, and each PCIe slot has its own PCI bridge.

And here's how to remove a card from the system:

echo 1 > /sys/bus/pci/devices/0000:02:00.0/remove
After executing that, the card is gone from lspci's output and the directory /sys/bus/pci/devices/0000:02:00.0 is gone as well. If you want it to reappear (without rebooting), do:

echo 1 > /sys/bus/pci/rescan
Googling will give you refs to other names in /sys, which I don't have. I assume the names changed in a recent kernel version. As of 2.6.32, the above 'remove' file is the one to use.

Thursday, January 17, 2008

Quick hit: Btrfs

Btrfs (which I pronounce as Betterfs) is a new Linux filesystem with many of ZFS's design sensibilities. It's being developed by Chris Mason from Oracle and was announced about half a year ago - the lkml announcement thread has some interesting posts.

How did I miss this until now? :-? Regardless, this is great news for Linux and Linux users. With ZFS entering the BSDs and OSX, Linux looked like it was going to be left behind. This is some great tech - go read about its features. I can't wait for it to become stable/mainstream; sadly that will probably be another year or so. Filesystems need a lot of testing before production use.

Have I said I really like the features of btrfs (and ZFS) yet? If I had to write out the features of my dream filesystem, these would constitute a big part.

Friday, January 11, 2008

War on Bad Software Paradigms

(Moved here from the introduction to the Total Saving post, because most people wouldn't be interested in this part.)

Most common programs available today are buggy and badly designed and implemented. But these are relatively minor issues - by dint of lifelong indoctrination we manage to tolerate and even ignore their failings in everyday use.

Bad OS and UI paradigms are far worse than mere bugs or design problems. Almost all end-user programs that exist today are built on many layers of fundamental architectural mistakes. Some of them made better sense when they were introduced with Unix and the Internet, thirty or forty years ago. All have remained unexamined and unchanged in mainstream software ever since.

I've had enough of bad computing paradigms. This is my declaration of war on stupid software. Every year we spends man-aeons on faster, safer, more featureful and glittering implementations of the same fundamentally broken designs. We build mousetraps so good that only intelligence-augmented rats can avoid them, but it's time to remember that what we really wanted to catch was a dragon.

This is the first in a planned series of posts that will outline the basic design issues as I see them with some common classes of end-user software - editors, file managers, browsers and the like - and in about eight months (when I plan to go to university) I'll start work on implementing these ideas in earnest. I feel I wouldn't really enjoy working on anything else before I build myself a set of better tools to do it with.

It's said that every aspiring C programmer wants to write their own OS. I prefer Python (which is still far from perfect) and would rather write my own UI - preferably, one that would allow a very high percentage of my computer use with just a few simple but powerful paradigms. Is that just a dream? Time will tell.

And so, on to the first post about Bad Software Paradigms: manual data management vs. Total Saving.

Space costs of Total Saving

Yesterday I discussed my Total Saving approach with a friend, who raised an objection: surely the storage costs would be prohibitively high?

In my last post on the subject I wrote: "hard disk sizes and costs have long reached a point where [removing user-created data to free space] is completely irrelevant for all the text and text-based data a person can produce in a lifetime." It's time to substantiate this claim with some hard numbers.

Suppose I type at 10 cps (characters per second). This is a rate few people can exceed even in ten-second bursts, but let us suppose I can keep it up indefinitely. I type at 10 CPS, without stopping to think, eat, or sleep, for a year.

That gives us 10 (cps) * 3600*24*365 (seconds in a year) = 315,360 thousand characters typed per year.

Suppose I store every keystroke in a log. A log record consists of the character typed, a timestamp, and maybe a few delimiters and such. A timestamp looks like "1200054174.237716", so we come to around 20 characters in all. This brings us to 6.3 GB of data per year.

HD costs are around 4GB per US$, so, for $1.5 a year you can store all your data. Of course, you can't get HDs that small, so I'll put this another way: if you can afford to buy even the smallest disk produced today (probably 40GB), it will last you for six years. By then, the smallest disk will probably be large enough to last you the rest of your life. (With 2008 lifespans, anyway.) Other storage solutions, including online ones, offer prices not much higher (Gmail can store almost that much for free), and with data creation pegged at 200 byes/sec bandwidth won't be a problem.

Of course the real storage room requirements would be smaller by an order of magnitude. We can start by saving every word typed, or every 10 characters, saving a lot of log record overhead. We can store the timestamps as time offsets, which would be far smaller, and use coarser resolution (we don't really need nanosecond precision here). We can compress the whole thing, which ought to produce savings of 50%-60% at least. There are many higher-level techniques we can apply to further reduce the space used, like only storing identical document-states once, but the point is they're not really necessary: we can easily keep well below 1GB a year, probably well below 100MB in practice, and everyone who uses a computer can afford that.

But wait, said my friend, this would be too inefficient. If the editor had to read and "simulate" a year's worth of keystrokes just to open a document, wouldn't that be too slow to be practical?

Not necessarily, but I'll grant that some documents and document-types require a faster approach. No fear - since our actual storage techniques allow random-access writing rather than just appending to a log, we can store the current document as a state-dump suitable for quick loading into an editor (like the complete text of a text file, to start with), while storing reverse diffs for the complete history. (This, too, has been demonstrated in VCSs and elsewhere.) Since we're only storing one dump (or one dump per period of time, say one weekly), this shouldn't take up too much space.

Or would it? My friend pointed out that MS Office documents were huge, often running into the multiple megabytes. My natural reaction, of course, was to ridicule MS Office, but we did conduct an anecdotal-level test.

The biggest MS Word doc we could find was about 16MB large, and contained 33 pages. By removing all Visio drawings and embedded images, we reduced it to 31 pages of pure text (with some formatting and tables), but it was still 6.5MB large. We opened it with OO.o Writer 2.3 and saved as ODF (which disrupted a lot of formatting and layout finetuning), but that still took over 3MB. We finally copied-and-pasted the whole text into a new Writer document, choosing to paste as HTML. The resulting document was only 17KB large. Adding back the lost formatting presumably would make it grow, but nowhere near a multi-megabyte size. The large size of the first ODF might be explained at least in part by Writer preserving Word cruft that wasn't visible from the editor itself.

Of course, the ODF format used by Writer isn't very efficient, even zipped (neither is any other XML-based storage). 17KB, although compressed, is more than the raw text of the document uncompressed. But even so, it would be good enough for the purposes of total storage.

There's an assumption underlying this discussion: that the great majority of data produced by humans is textual in nature. Of course photos and sound and video recordings produce far more data than the amounts described here. Total storage can't be applied to them, at least not with current storage costs.

But even there, some things can be improved. When I record a movie, I may be producing multiple megabytes of data per second. When I edit a movie and apply a complex transformation to it, which changes all its frames at maybe a frame a second, it might seem that I'm producing a frameful of data per second, which can still come out on the order of MB/sec. But in reality, all the new data I've produced is the 100 or 200 bytes of processing instructions, divided by the ten hours it takes to transform the entire movie. The rest was accomplished by an algorithm whose output was fully determined by those instructions and the original movie. If our log stored only these, we could still reproduce the modified movie. (Of course, it would take us 10 hours.)

Put it another way: today we pass around CGI-rendered movies weighing GBs, when the input models rendered them might be an order magnitude smaller (this is only an uneducated guess on my part: perhaps they are in reality an order of magnitude larger - but bear with me for a moment). These generated movies don't contain any more information than the models and software used to produce them, they merely contain more data. One day our computers will be able to generate these movies at the speed we watch them. And then only the input data will be passed around.

Today we download large PDFs, but we could make do with the far smaller inputs used to generate them if only all our recipients had the right software to parse them.

Real-time movie rendering may be a way off yet. But total saving of the complete input we're capable of sending through our keyboards, mice, and touchscreens isn't. We just need to write the correct software. This state of things will last until brain-machine interfaces become good enough to let us output meaningful data at rates far exceeding a measely ten characters a second, or even the few hundred bytes per second we can generate by recording the precise movement of pointing devices. Even then, the complete log of your mouse movements is rarely as interesting as the far smaller log of your mouse clicks and drags-and-drops. The whole MMI field is still in its infancy.

In theory, if you record all the inputs of your deterministic PC since a clean boot, you can recompute its exact state. In practice, total saving of text-based input will suffice - for now.