psmv4: Storage schemas

Somewhere in reference 1 I talked of the brain needing to assemble stuff from all over the brain in order to make up the conscious experience.

Then it occurred to me this morning that there may be a link to the storage schemas of databases on computers, important in the days when I knew about such things, back in the 1980’s, when the size of databases was outstripping the speed of the computers needed to search them.

Suppose we have a database containing a record for every person in the land, the sort of thing mentioned at reference 2 in connection with keeping track of migrants. When it is not active, such a database might well be stored on some kind of disc, optical or magnetic.

Now disc storage is essentially serial or linear. Records are placed on the disc one after another and it makes sense to access the next record or even the 475th record. Down the page in the yellow block in the sketch above. Which is fine when the records have some natural order, perhaps given by some identifier, the sort of many digit reference numbers that record keepers are fond of, and when one often wants to process all the records. Just plough through them, one after the other.

Which is not so fine otherwise, and for these other purposes one might want to process them in some other order or to process some small part of the totality. Perhaps in the case of a database of people, in order of occupation. Do all the bakers, then all the candlestick makers, then all the dungaree makers. Or perhaps, for some reason or other, just the candlestick makers. Everyone else can wait for another day.

For these purposes one adds indexes to one’s database. So an occupational index might collect up all the person serial numbers occupation by occupation. So when you say give me all the candlestick makers, the computer can do that without having to go through all 100 million records, or however many there might be.

This seems to work better than the alternative of keeping lots of copies of the database, each copy in its own order. First, as access, while better than a serial scan, is not as fast as indexed access. Second, as maintaining lots of copies of the data is more complicated than maintaining one copy. One copy of the data serving everybody’s needs being one of the principles driving the development of databases in the first place.

This is sketched in the figure above, where we have the data file in yellow right – with a rather lazy long reference number in the first column – and just two indexes in green left. Note that the indexes, while they have as many entries as there are data records, take up much less space than those records. Access is faster and maintenance is cheaper.

The specifications of all these indexes were stored in something called the storage schema, which described how the data was actually stored on the disk. While the schema described the logical organisation of the data. Most of the time the users of the data, the computer programmers, only needed to bother about the latter, a useful reduction of complexity.

So let us now suppose that the memory in the brain stores lots of small facts, and in the interests of efficiency it stores just one copy of each fact, with the memory being something like a disc file. One can just plough through the whole file, or, more likely, one wants to have indexes.

So one index might give me all the beef dinners I have ever had. Another might give me all the spectacles I have ever bought. And yet another might be about time and could give me all the stuff which happened at some particular time. By working with two indexes, one for time and one for place, I could get everything that happened at some particular time and place.

Except that sometimes these indexes go wrong and the odd pork dinner gets included among the beef dinners. Or a fact from Monday gets mixed up with the facts from Tuesday, giving rise to a rather confusing picture when all the supposedly Tuesday facts for Woking get assembled into a conscious experience.

I need to ponder about what, if anything, this analogy is good for!

Reference 1: https://psmv4.blogspot.com/2021/02/binding-problem.html.

Reference 2: https://psmv4.blogspot.com/2021/02/gross-flows.html.

psmv4

Sunday 7 February 2021

Storage schemas

No comments:

Post a Comment