A MySQL Time Machine

N.B. This post is required for MySQL 5.1 (or earlier) and will likely be needed for newer editions of the RMDBS as well

Did Doc Emmet Brown not invent time travel 10 years before MySQL was initially released?

He did and the laws of known physics haven’t been broken yet, what I am explaining here is representing temporal state of an element (e.g an account) whose historical state can be queried for any given point of time and future state(s) can be set in advance.

 

image

In the above illustration the Account is active for the timestamp of the query, Green represents an active account

The tables to be used in this effort will look similar to this

image

Account table

image

Account actions table

The account table has the usual fields:

  • Account_ID
  • Name
  • Credit card number
  • etc

The  account_actions table records :

  • A unique action ID
  • An Action_Type – this is an enumeration of an account action
  • An Action_Date – the point where this account action takes come into effect
  • Created_Date – the date that this action was created

Can I not simply do this with a JOIN and a WHERE?

Oh like this?

 1: SELECT
 2:     Account.Name,Account_actions.Action_Type
 3: FROM
 4:     Account
 5: JOIN
 6:     Account_actions
 7: ON
 8:     account.Account_ID = Account_actions.Account_ID
 9: WHERE
 10:     Account_actions.Action_Date < NOW();
 11: 

In the above example we hope to obtain the account name from the Account table and the previous action_type from the Account_Actions table with the account action corresponding to the most recent one, this will produce erroneous results as that JOIN operation does not apply sorting of the results from the joined table to the join even if we were to add an ORDER BY.

In most cases the Action_Type returned would likely be the first one stored into the Account_actions table in the subset of data created from that WHERE clause.

It is entirely possible that this erroneous implementation could lay in wait ready to trip up even experienced SQL Query Craftspeople as it seems to be a correct logical Command

Is there a solution?

Yes in place of joining a pre-existing table we JOIN a table generated by an inline SELECT command which has the necessary sorting and filtering in place.

So to correctly implement this goal we would use the following command

 1: SELECT
 2:     Account.Account_ID, Generated_account_actions.Action_Type
 3: FROM
 4:     Account
 5: JOIN
 6:     (
 7:         SELECT
 8:             scheduled_actions.Account_ID, scheduled_actions.Action_Type
 9:         FROM
 10:             scheduled_actions
 11:         WHERE
 12:             scheduled_actions.Action_Date < NOW()
 13:         ORDER BY
 14:             scheduled_actions.Action_Date DESC
 15:     ) AS Generated_account_actions
 16: ON
 17:     account.Account_ID = Generated_account_actions.Account_ID;

This query uses a the results from the filtered and ordered table (under an alias of Generated_account_actions) to perform a JOIN and return the most recent action state for this user account.

I hope this post helps you when making your next killer MySQL backed App, these queries may be performance optimised but I wish to illustrate a point.

Semantic Web knowledge–An introduction

This is a simple introduction, for more info read the further info links at the bottom and if you like what you read consider enrolling in a Masters-level Degree in Computer Science\Computing (seriously, this was the most rewarding thing I have ever done)

 

A semantic web, is that some sort of crazy net that sailors use to catch fish?

No that is a fishing net, to explain the ideas behind the semantic web I must first do 2 things

  1. Provide and example of a semantic sentence opposed to one that is merely syntactically valid
  2. Explain what a semantic web could be and provide example of semantic reasoning

 

Spot the difference

If someone would to ask me what my favourite colour was I may respond with the something like:

My Favourite colour is green

This is a semantically valid sentence, it has a clear meaning and is constructed in the following manner:

[pronoun] [adjective] [noun] [verb] [noun]

Which arguably forms a technically correct English sentence, however if we were to use a different noun in place of green for example Kilbeggan then the sentence would not make any sense/hold semantic meaning but still be syntactically valid as we were talking about a colour not something delicious like dark chocolate, a good steak or a bottle of Kilbeggan which all represent my favourite elements of a different collection (e.g  whiskey) .

My favourite colour is Kilbeggan

This is a very simple example of meaning and how it corresponds to understanding of an overall concept but if a person or intelligent agent did not know that Kilbeggan was not a colour and so an invalid choice for this sentence there would be a possibility of this simple non-sentence being accepted as fact to that person/intelligent agent.

Classification via sematic association provides entities with related properties of terms and words so that they can correctly determine if a sentence is valid and additionally infer meaningful information\facts from a pool of data (known as an ontology).

 

So is the goal to give meaning to every single word on the web?

That’s a tough one, some would argue that such a scenario would be ideal but I believe that’s not really an immediately feasible goal and so it would be preferable to build semantic webs that can represent knowledge about specific domains for example a database of medial terms and conditions which can independently be understood by computer agents.

Semantic webs are based on representing knowledge using a collection of Subject::Predicate::Object /Entity Attribute Value statements\Tuples

An example of  three such SPO tuples in RDF notation are

Statement 0: (#Joe, rdf:type, #Student)
Statement 1: (#Student, rdf:type, #Human)
Statement 2: (#Human, rdf:type, #Animal)

These Statements represent the following facts

Statement 0: Joe is a Student
Statement 1: A Student is a Human
Statement 2: A Human is an Animal

By storing these 3 EAV based facts in a semantic database system we can write queries/rules to infer facts about entities, for example using just these 3 rules you can successfully determine that either :

Inference 0: Joe is an Animal | which in RDF is | (#Joe, rdf:type, #Animal)
Inference 1: Joe is a Human | which in RDF is |  (#Joe, rdf:type, #Human)

The reasoning outcome depends on the structure of the inference rule and desired goal, such inferences could be

Inference Rule 0: [rule1: (?z http://www.w3.org/2000/01/rdfschema#type ?y) (?y http://www.w3.org/2000/01/rdfschema#type ?x) (?x http://www.w3.org/2000/01/rdfschema#type ?w) -> ( ?z http://www.w3.org/2000/01/rdfschema#type?w )]

Inference Rule 1: [rule1: (?z http://www.w3.org/2000/01/rdfschema#type ?y) (?y http://www.w3.org/2000/01/rdfschema#type ?x) -> ( ?z http://www.w3.org/2000/01/rdfschema#type?x )]

 

These rules are detailed in JENA-compatible SWRL form but may also be expressed using SPARQL (JENA is a semantic web framework).

 

Could this not be achieved in a traditional RMDBS with some inference logic?

Yes, but traditional RMDBS systems are not designed for representation and inferring semantic facts so it would not really be suited and the use of specific semantic tools and data stores provide:

  • consistent performance
  • specialised optimisation of execution
  • standard interfaces for modifying and reasoning facts
  • mechanisms to efficiently understand/execute semantic rules

So there is less needless re-implementation that may be present if you were to use pre-existing DB system and write inference logic in custom blocks of your chosen programming language.

 

What does this mean for us?

Ultimately computer agents that can understand content & human communication that will reliably be able to provide facts from large collections of this content. An outcome could be a computer agent that could replace every diagnostic expert, a personal assistant application akin to SIRI that is 100% reliable and can handle any knowledge request that a human could process given a global dataset or using semantic reasoning to see if a webpage contains meaningful information or is just a SEO optimised spam /link bait advertisement pages (these may fool search engines).

nfl-jerseys-serp

Search engine spam

I eagerly anticipate the day where search engines are fully semantic and can understand and reason any query to give flawless and succinct responses from the worlds knowledge

Further Reading

  1. http://www.ics.forth.gr/isl/swprimer/
  2. http://notabug.com/2002/rdfprimer/
  3. http://en.wikipedia.org/wiki/Inference
  4. http://en.wikipedia.org/wiki/First-order_logic
  5. http://en.wikipedia.org/wiki/Web_Ontology_Language
  6. http://www.lesliesikos.com/tutorials/rdf/
  7. http://en.wikipedia.org/wiki/Resource_Description_Framework

Manipulating / Resizing Images using Classic ASP

By Classic ASP do you mean the COBOL of early 21st Century web development that hasn’t expanded functionality in what seems like forever?

Yes that Classic ASP, the one that you may encounter when maintaining enterprise codebases.

What magic incantation do I have to use?

Classic ASP supports 3 main languages VBScript, JScript and PERLScript which although they are not known for their strong OO they can access COM/OLE/System objects and this is how we are going to manipulate images.

Standing on the shoulders of giants

There are a few components that we can leverage for this end but this tutorial will cover the legendary and reliable Image Magick suite, download the version that you desire (I used the q16 winx64 static version) , restart your server and then fire up your notepad.

The code

In the below code we will be resizing the height of an image while preserving the aspect ratio

 

   1:  'Declare the variables
   2:  dim srcImg, fs, imageMagick, tmpVar,targImgPath, imgPath,image_width, image_height, newHeight
   3:   
   4:  'New image height
   5:  newHeight = 400             
   6:   
   7:  'Create a File System Object for manipulating files
   8:  Set fso= CreateObject("Scripting.FileSystemObject")
   9:   
  10:  'Declare the path of the image we wish to manipulate
  11:  targImgPath = "c:\\image.jpg"
  12:   
  13:  'Get the path of the image taht we wish to manipulate by using the Server.MapPath function
  14:  imgPath = Server.MapPath(targImgPath)
  15:   
  16:  'Get image dimensions using the load picture operation
  17:  'Get the image contentthat we wish to manipulate by using the fs.GetFile object
  18:   
  19:  set srcImg = loadpicture(fs.GetFile(imgPath))
  20:   
  21:  'Get the current image height and width        
  22:  image_width = srcImg.width
  23:  image_height = srcImg.height
  24:         
  25:  'Calcualate the aspect ratio of the image
  26:  tempVar =  image_width / image_height 
  27:   
  28:  'Calculate the new width of the image 
  29:  image_width = Round(newHeight * tempVar) 
  30:   
  31:  'Create an Image Magick object to manipulate the image
  32:  Set imageMagick = Server.CreateObject("ImageMagickObject.MagickImage.1")
  33:   
  34:  'Resize the image to the desired width and height and output the image to where desired
  35:  imageMagick.Convert Server.MapPath(imgPath, "-resize" , image_width&"x"&image_height, "C:\\resizedimage.jpg")

Simple! and only the tip of the iceberg,  Image Magick has lots of functions that you can invoke in a similar manner Smile

Using Java Classes in a .NET project

Let me guess,Witchcraft ?

No witchcraft was involved (directly anyway, I did sacrifice a lamb to get some blue mountain coffee to keep myself awake when looking through alternative solutions), neither is mangling JAVA source code into C# form.

But .NET has everything!!!!

Java has a rich ecosystem with many reusable projects in existence for it which is great if you are working in a Java environment but if you need to develop a program using Microsoft .NET technologies and wish to leverage a Java class then you may be SOL (sadly outta luck).

I was in this situation a while ago when integrating some JAVA-only semantic web tools (Pellet and JENA) into a .NET web project, the solution I came to was to use an implementation of JAVA for Mono/.NET called IKVM.

IKVM allows cross-compilation of Java JAR files to .NET compatible DLL files which can be then be imported into your .NET project

This conversion process looks similar to

image

Lets go!

Download the iKVM toolset and extract it to a memorable location (e.g. c:\ikvm) and then compile or download a .JAR file for the Java classes that you wish to use.

Once you have obtained your .JAR file you need to compile this to the .DLL to do this put the .JAR into the same folder as your ikvmc.exe file (above this extracted this to c:\ikvm\bin ) and then perform the conversion.

The command to run the conversation may vary if your JAVA component has dependencies, to read more about this go to the IKVM documentation.

A simple conversion of the Jena.JAR files

   1:  ikvmc .\jena\*.jar -target:library -out:.\dotNet-Libs\jena.dll -version:2.5.4.0 -nowarn:IKVMC0109

This operation will be performed on the following Jena related .JAR files

imageThis produces the following output the warnings refer to dependencies that Jena requires that have not been satisfied in this cross compile, these will be compiled to DLL files in another process.

image

This will produce a single jena.dll file in the dotNet-Libs folder, this DLL file will need to be imported into the .NET project in addition to all of the standard  IKVM.OPENJDK and IKVM reflection and runtime DLL files (these provide a runtime).

To compile the entire set of Pellet 1.5.2, MSSQL JDBC driver, Jena 2.5.2 JAR  files (with dependencies) it may be best to run a batch file with the following series of operations (assuming you have the correct jar files in the listed folders)

   1:  mkdir dotNet-Libs 
   2:  mkdir pellet 
   3:  mkdir sqldriver 
   4:  move aterm-java-1.6.jar .\pellet 
   5:  move pellet-core.jar .\pellet 
   6:  move pellet-jena.jar .\pellet 
   7:  move servlet.jar .\pellet 
   8:  move sqljdbc.jar .\sqldriver 
   9:  rem Convert Pellet 1.5.2 auxiliary libraries 
  10:  ikvmc .\xsdlib\*.jar -target:library -out:.\dotNet-Libs\xsdlib.dll -version:1.5.2.0 -nowarn:IKVMC0109 
  11:  ikvmc .\jetty\*.jar -target:library -out:.\dotNet-Libs\jetty.dll -version:1.5.2.0 -nowarn:IKVMC0109 
  12:  ikvmc .\junit\*.jar -target:library -out:.\dotNet-Libs\junit.dll -version:1.5.2.0 -nowarn:IKVMC0109 
  13:  ikvmc .\owlapi\*.jar -target:library -out:.\dotNet-Libs\owlapi.dll -version:1.5.2.0 -nowarn:IKVMC0109 
  14:  rem Convert Jena 2.5.4 
  15:  ikvmc .\jena\*.jar -target:library -out:.\dotNet-Libs\jena.dll -version:2.5.4.0 -nowarn:IKVMC0109 
  16:  rem Convert Pellet 1.5.2 libraries 
  17:  ikvmc .\pellet\*.jar -target:library -reference:.\dotNet-Libs\xs dlib.dll -reference:.\dotNet-Libs\jetty.dll -reference:.\dotNet-Libs\junit.dll -reference:.\dotNet-Libs\owlapi.dll -reference:.\dotNet-Libs\jena.dll -out:.\dotNet-Libs\pellet.dll -version:1.5.2.0 -nowarn:IKVMC0109 
  18:  rem Convert Microsoft jdbc SQL Driver 1.2 
  19:  ikvmc .\sqldriver\*.jar -target:library -out:.\dotNet-Libs\sqldriver.dll -version:1.2.0.0 -nowarn:IKVMC0109 
  20:  copy IKVM.*.dll .\dotNet-Libs\ 
  21:  pause

Once you have generated all the Java origin DLL files you will need to import these into your .NET project in the usual way (if you don’t know how to do this i suggest that you trawl MSDN for a few hours to learn all the basics of .NET development).

Bumps in the road?

Yes, you need to be wary of Namespace conflicts, e.g. if both .NET  and Java have an available object called shinyObject your will need to explicitly call the FULL name of this object in your code as shown when creating the JENA statement and GenericRuleReasoner objects in the following code snippet

   1:  using System;
   2:  using System.Data;
   3:  using System.Configuration;
   4:  using System.Linq;
   5:  using System.Web;
   6:  using System.Web.Security;
   7:  using System.Web.UI;
   8:  using System.Web.UI.HtmlControls;
   9:  using System.Web.UI.WebControls;
  10:  using System.Web.UI.WebControls.WebParts;
  11:  using System.Xml.Linq;
  12:  using System.IO;
  13:  using SpeechLib;
  14:  using System.Media;
  15:   
  16:  using System.Collections;
  17:   
  18:  //Java Functionality
  19:  using ikvm.io;
  20:  using java.io;
  21:  using ikvm.lang;
  22:   
  23:  //Import Pellet
  24:  using org.mindswap.pellet;
  25:  using org.mindswap.pellet.jena;
  26:   
  27:  //IMPORT JENA TOOLS
  28:  using com.hp.hpl;
  29:  using com.hp.hpl.jena;
  30:  using com.hp.hpl.jena.db.impl;
  31:  using com.hp.hpl.jena.ontology;
  32:  using com.hp.hpl.jena.rdf.model;
  33:  using com.hp.hpl.jena.reasoner;
  34:  using com.hp.hpl.jena.query;
  35:  using com.hp.hpl.jena.util.iterator;
  36:   
  37:  //JENA TDB
  38:  using com.hp.hpl.jena.tdb;
  39:  using com.hp.hpl.jena.tdb.store;
  40:  using com.hp.hpl.jena.tdb.@base.file;
  41:   
  42:   
  43:  /* 
  44:   * 
  45:   * 'Statement' is an ambiguous reference between 'SemWeb.Statement' and 'com.hp.hpl.jena.rdf.model.Statement'
  46:   * 
  47:   */
  48:   
  49:   
  50:  public class sourMash
  51:  {
  52:   
  53:    
  54:      //A Jena Model RDF Statement
  55:      private static com.hp.hpl.jena.rdf.model.Statement jenaStatement = null;
  56:   
  57:        //Generate a Rule Reasoner
  58:        com.hp.hpl.jena.reasoner.rulesys.GenericRuleReasoner reasoner = null;
  59:   
  60:  }

 

I hope this helps you in the beginning of your search for harmony between your development tools.

For convenience I have attached all the required JAR files, DLL Files, tools and scripts (ik-com.bat) required to compile and import PELLET and JENA in .NET here:  Pellet and Jena dotNET to Java Archive (this archive requires the excellent open source 7-ZIP file manager)

ReFS

Re-Implemented FS?

Re-Designed FS?

Resilient FS?

Its actually all of the above.

ReFS is the new file system that Microsoft’s Steven Sinofsky announced on the 16 Jan 2012, unlike previous attempts by Microsoft to develop a new FS (WinFS, an unholy and aborted RMDBS/OODBMS as a FS that sat on NTFS) this represents a much more logical and less insane evolution of file systems reminiscent in some minor ways to the awesome ZFS (which powers my home server incidentally) developed by the group then called SUN microsystems (they dead, eaten by a Grue/Oracle).

What can it do?

After reading the MS posts about this system I can see 3 huge advantages to using this FS.

1) GREATER INTEGRITY

The Metadata that the OS uses is verified with a 64-bit checksum which is the product of a mathematical operation of a series of data that creates a fairly unique digest of the contents of that data.

Simply put a checksum is a fairly unique fingerprint of data which should change if as little as a single bit in a data set is modified for example using the SHA1 checksum the phrase “The name is bond james bond” is digested to “47b5b5cb374b6adf5523aff8b45c742cb03cda48”  but “The name is bond James bond” is digested to “a497f3764a972469d89b4d689eeaf71779b8ec7b” this is used to verify that what we had written to a disk is as intended. This is important as all current storage mechanisms are not 100% reliable and can lead to hidden corrupted files in future.

This would function something like this:

[Hold data in RAM] >> [Calculate Checksum of Data: CHK0] >> [Write Data to Disk] >> [Read Data from disk and calculate its Checksum:CHK1]  >> [Compare CHK0 with CHK1, if they differ then attempt to write again]

Check-summing of the Metadata is automatic but this function is optional for the files contents using “integrity streams”, this makes me unhappy as I prefer to have my data verified as well and so my home server will stick to ZFS.

 

2) SCALEABLE

Using B+ trees (more info) exclusively this FS does offers scalable storage capacity and can grow while still offering efficient operation times.

This means that limits on File size,Volume Size, number of files/directories are now only limited by 64-bit numbers meaning that the maximum volume size is 1 Yobibyte (280 bytes) when using 64KB clusters with a max file size being 16 Exbibytes (260 bytes) for perspective a Terabyte is (212 bytes) so we are talking about some limits that shouldn’t affect us for a while (us as a species, ill likely be dead by then… as will you) .

 

 

3) FOREVER ONLINE

Ok that’s a little bit of bait if you turn the system off the disk will not be available but if you wish to perform a low level operation (like a disk check) you shouldn’t need to take the volume offline, no more rebooting to fix corruption (which shouldn’t affect ReFS as much anyway, a little bit of irony here)

Where does this fit in?

At the core of windows, deep in the Kernel.

NTFS.SYS = NTFS upper layer API/semantics engine / NTFS on-disk store engine; ReFS.SYS = Upper layer engine inherited from NTFS / New on-disk store engine

Above is the image representing the change in FS feature produced by Microsoft, in case you haven’t noticed this is as clear as tar and low on details (what is the bump? Shared Features to be relocated?, Implementation Bloat?, A collection of Ugly Hacks?).

The blue blocks represent API calls and logic for software to consume the features offered by this FS driver, much like NTFS.sys I would think that these calls are not to consumed by end programmers by instead by the OS Kernel and abstracted into programmer API calls (http://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx), if performed correctly this migration should be transparent to application software.

 

NTFS Architecture

 

The Red blocks are where all the exciting changes take place so for example a BlueBox command called writeBytes() could be the same in both NTFS and ReFS but the Red block logic for NTFS could simply write the data to a free area with the ReFS implementation being similar to the write function previously proposed in this article which (for the lazy) is:

[Hold data in RAM] >> [Calculate Checksum of Data: CHK0] >> [Write Data to Disk] >> [Read Data from disk and calculate its Checksum:CHK1] >> [Compare CHK0 with CHK1, if they differ then attempt to write again]

Then why is the blue box bigger than the redbox?

The blue box commands all abstract a combination of common red box commands to produce a goal, e.g a red box open a directory command could be combined with a red box list file/directory function to perform a blue box search function.

What’s the catch?

Well there are 3

1) This is untested

This is a big drawback, although I’m sure MS have tested every aspect of this system with dog-fooding, Fuzzing and Unit tests there is still the risk that some aspect of this FS will not be implemented 100% correctly and will kill your data and eat your children if given the chance!

If MS want me to trust this they should move all of their internal systems and Source code repos to  ReFS and inform the world what happened 2 years later…. this wont happen so I will wait until all the bugs have been repaired.

2) Windows server 8 only and Not backwards compatible

A ReFS volume cannot be accessed by Windows 7/2008R2 or earlier so if your one Win 8 server dies you may have to wait for a reinstall to get essential data off your SAN, if this delays the payroll processing for your company you better blame a virus or there may be an old fashioned angry mob at your door with cacti.

3) Not Default and non-bootable

This FS not the default in Windows 8 server and cannot be booted from, this does provide a lot of information about Microsoft’s confidence in their new baby

More potential issues

It appears that MS will be deprecating some NTFS features of these 2 features concern me most: EFS and Hard links.

EFS

Is the encrypting File system that NTFS has been incorporated into windows since Win 2000 and is used to secure data in corporate and personal environments (NTFS permissions mean nothing if the ‘protected’ disk is accessed with a live Linux disk or another windows install as they are easily overcome).I think this feature may be the bump in the Red box.

I hope that the companion storage spaces feature or some other user land tool will add support for EFS as this is an essential feature for backwards compatibility that said bitlocker seems to be the way of the future for windows encryption and so this could be the end of EFS (you willl be moving files to the new volume anyway).

HARD LINKS

Removing hard links removes POSIX compatibility, this could mean that systems such as CYGWIN may not function any more in a ReFS volume which may be bad for academic workstations that need GCC and other such UNIX tools in windows (this is essential for system modelling with SPIN).

OTHER REMOVALS

There are some other features that are being removed but I personally am not worried as user land tools could easily replace these functions ( and compression is less relevant in the time of 3 TB hard disks)

Q) What semantics or features of NTFS are no longer supported on ReFS?

The NTFS features we have chosen to not support in ReFS are: named streams, object IDs, short names, compression, file level encryption (EFS), user data transactions, sparse, hard-links, extended attributes, and quotas.

 

More info:

http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx