wissel.net

Usability - Productivity - Business - The web - Singapore & Twins

By Date: August 2014

Long Term Storage and Retention


Not just since Einstein time is relative. For a human brain anything above 3 seconds is long term. In IT this is a little more complex.

Once a work artefact is completed, it runs through a legal vetting and it either goes to medium or long term storage. I'll explain the difference in a second. This logical flow manifests itself in multiple ways in concrete implementations: Journaling (both eMail and databases), archival, backups, write-once copies. Quite often all artifacts go to medium term storage anyway and only make it into long term storage when the legal criteria are met. Criteria can be:
  • Corporate & Trade law (e.g. the typical period in Singapore is 5 years)
  • International law
  • Criminal law
  • Contractual obligations (E.g. in the airline industry all plane related artefacts need to be kept at least until the last of that plane family has retired. E.g. the Boing 747 family is in service for more than 40 years)
For a successful retention strategy three challenges need to be overcome:
  1. Data Extraction

    When your production system doesn't provide retention capabilities, how to get the data out? In Domino that's not an issue, since it does provide robust storage for 25 years (you still need to BACKUP data). However if you want a cross application solution, have a look at IBM's Content Collector family of products (Of course other vendor's have solutions too, but I'm not on their payroll)
  2. Findability

    Now an artifact is in the archive, how to find it? Both navigation and search need to be provided. Here a clever use of Meta data (who, what, when, where) makes the difference between a useful system and a Bit graveyard. Meta data isn't an abstract concept, but the ISO 16684-1:2012 standard. And - YES - it uses the Dublin core, not to confuse with Dublin's ale
  3. Consumability / Resillience

    Once you found an artifact, can you open and inspect it. This very much boils down to: do you have software that can read and render this file format?
The last item (and the second to some extend) make the difference between mid-term and long-term storage. In a mid-term storage system you presume that, short of potential version upgrades, your software landscape doesn't change and the original software is still actively available when a need for retrieval arises. Furthermore you expect your retention system to stay the same.
On the other hand, in a long-term storage scenario you can't rely on a specific software for either search or artifact rendering. So you need to plan a little more carefully. Most binary formats fall short of that challenge. Furthermore your artefacts must be able to "carry" their meta data, so a search application can rebuild an access index when needed. That is one of the reasons why airline maintenance manuals are stored in DITA rather than an office format (note: docx is not compliant to ISO/IEC 29500 Strict).
The problem domain is known as Digital Preservation and has a reference implementation and congressional attention.
In a nutshell: keep your data as XML, PDF/A or TIFF. MIME could work too, it is good with meta data after all and it is native to eMail. The MIME-Trap to avoid are MIME-parts that are propriety binary (e.g. your attached office document). So proceed with caution
Neither PST, OST nor NSF are long term storage suitable (you still can use the NSF as the search database)
To be fully sure a long term storage would retain the original format (if required) as well as a vendor independent format.

Read more

Posted by on 14 August 2014 | Comments (1) | categories: IBM Notes Software

Time stamped encrypted archives


Developers use Version Control, business users Document management and consultants ZIP files.
From time to time I feel the need to safeguard a snapshot in time outside the machine I'm working with. Since " storage out of my control" isn't trustworthy, I encrypt data. This is the script I use:

# !/bin/bash
############################################################################
# Saves the given directory (%1) in an SSL encrypted zip file (%2) within
# the personalFiles folder. The name of the ZIP file needs to be without zip
# extension but might already contain the date. Destination might be %3
############################################################################
# Adjust these three values to your needs. Don't use ~ otherwise it doesn't
# work when you use sudo
tmplocation=/home/user/temp/
keyfile=/home/user/.ssh/pubkey.pem
privatekey=/home/user/.ssh/privkey.pem
if [ -z "$3" ]
  then
    secureloction=./
else
    secureloction=$3
fi
fullzip=$tmplocation$2.zip
fulldestination=$secureloction$2.szip
securesource=$1

# If the final file exists we unencrypt it first to update it
if [ -f "${fulldestination}" ]
then
    echo "Decrypting ${fulldestination}..."
    openssl smime -decrypt -in "${fulldestination}" -binary -inform DEM -inkey $privatekey -out "${fullzip}"
    # Zip the directory
    echo "Updating from ${securesource}"
 zip -ru $fullzip $securesource
else
    echo "Creating from ${securesource}"
 zip -r $fullzip $securesource
fi

# Encrypt it
echo Encrypting $fulldestination
openssl smime -encrypt -aes256 -in $fullzip -binary -outform DEM -out $fulldestination $keyfile
# Remove the temp file
shred -u $fullzip
notify-send -t 1000 -u low -i gtk-dialog-info "Secure backup completed: ${fulldestination}"

To make that work, you need Encryption keys, you can create yourself. A typical script to call the script above would look like this:

# !/bin/bash
############################################################################
# Save the Network connections from /etc/NetworkManager/system-connections
# in an SSL encrypted zip file
############################################################################
securesource=/etc/NetworkManager/system-connections
# Save one version per day
now=$(date +"%Y%m%d")
# Save one version per month
# now=$(date +"%Y%m")
zipfile=networkconnections_$now
secureloction=/home/user/allmyzips/
zipAndEncrypt $securesource $zipfile $secureloction

When you remove the decryption part (one time creation only, no update), you would only need to have access to the public key, which you could share, so someone else can provide you with a zip file encrypted just for you.
As usual: YMMV.

Posted by on 13 August 2014 | Comments (0) | categories: Linux

Designing a REST API for eMail


Unencumbered by standards designed by committees I'm musing how a REST API would look like.
A REST API consists of 3 parts: the URI (~ URL for browser access), the verb and the payload. Since I'm looking at browser only access, the structured data payload format clearly will be JSON with the prose payload delivered in MIME format. I will worry about calendar and social data later on.
The verbs in REST are defined by the HTTP standard: , PUT, and DELETE. My base url would be and then continue with an additional part. Combined with the 4 horsemen verbs I envision the following action matrix:

Read more

Posted by on 08 August 2014 | Comments (2) | categories: IBM Notes vert.x

Running vert.x with the OpenNTF Domino API


In the first part I got vert.x 3.0 running with my local Notes client. The mastered challenges there were 32 Bit Java for the Notes client and the usual adjustment for the path variables. The adoption of the OpenNTF Domino API required a few steps more:
  1. Set 2 evironment variables:
    DYLD_LIBRARY_PATH=/opt/ibm/notes
    LD_LIBRARY_PATH=/opt/ibm/notes
  2. Add the following parameter to your Java command line:
    -Dnotes.binary=/opt/ibm/notes -Duser.dir=/home/stw/lotus/notes/data -Djava.library.path=/opt/ibm/notes
    Make sure that it is one line only. (Of course you will adjust the path to your environment, will you?)
  3. Add 4 JAR files to the classpath of your project runtime:
    • /opt/ibm/notes/jvm/lib/ext/Notes.jar
    • /opt/ibm/notes/framework/rcp/eclipse/plugins/
      com.ibm.icu.base_3.8.1.v20080530.jar
    • org.openntf.domino.jar
    • org.openntf.formula.jar
    I used the latest build of the later two jars from Nathan's branch, so make sure you have the latest. The ICU plug-in is based on the International Components for Unicode project and might get compiled into a future version of the Domino API.
Now the real fun begins. The classic Java API is conceptually single threaded with all Domino actions wrapped into NotesThread.sinitThread(); and NotesThread.stermThread(); to gain access to the Notes C API. For external applications (the ones running neither as XPage/OSGi nor as agent, the OpenNTF API provides the Domino Executor.

Read more

Posted by on 06 August 2014 | Comments (1) | categories: IBM Notes vert.x