​G​ovHack Tasmania 2017

GovHack is an annual event held simultaneously around Australia to provide an opportunity for teams to explore, mashup and use open government data over a 46 hour period.

GovHack 2017 is happening on the weekend of 28th July - 30th July, 2017 starting on Friday night at 7pm.

For more information see GovHack.org.

Open data

LINC Tasmania is committed to delivering open data as part of providing free access to information resources and support in their use.

Over a million LINC Tasmania records are available in a variety of file formats as part of the open datase​ts available at http://data.gov.au/organization/linctasmania under a Creative Commons Attribution 4.0 licence.

Visit our page on open data for more information or to access LINC Tasmania's open data sets.

Data formats

LINC Tasmania provide our open data in to 3 main formats.

  • JSON

    JSON (JavaScript Object Notation) is a syntax for storing data and works well with opendata sets. It's very easy for the user to fetch and manipulate and is easily parsed in many different programming languages. JSON is a good choice for processing data at both the server side and with javascript on the client side.

    Note: Our datasets are very large and are probably too large to be loaded at the client side so we recommend only using JSON at the client side if you are pulling smaller sections of a dataset via the data.gov.au API.

    Here is an example of some JSON data that we provide:
             {
        "1449570": {
            "CATKEY": "(Sirsi) 1449570",
            "FORMAT": "VIEW",
            "INDEX": "Prisoners",
            "LINC_TAS_AVAIL": "Online",
            "NAME": "Allen, William",
            "NAME_FULL_DISPLAY": "Allen, William",
            "PERMA_LINK": "https://linctas.ent.sirsidynix.net.au/client/en_AU/all/search/detailnonmodal/ent:$002f$002fNAME_INDEXES$002f0$002fNAME_INDEXES:1449570/one",
            "PUBDATE": [
                "1900",
                "1901",
                "1902"
            ],
            "PUBDATE_RANGE": "1900-1902",
            "REFERENCE_URL": {
                "URL_TEXT": "GD128/1/1",
                "URL": "https://stors.tas.gov.au/GD128-1-1$init=GD128-1-1p155j2k"
            },
            "TASMANIAN": [
                "Published in Tasmania",
                "About Tasmania",
                "By a Tasmanian"
            ],
            "YEAR": "1900-1902",
            "YEAR_DISPLAY": "1900-1902"
        }
    }
    
  • XML

    XML is a mark-up language and works very well at storing our open datasets due to their hierarchical nature. XML and JSON provide a basically the same data but you will most likely require an XML parser to navigate the XML document or possibly use regular expressions to extract the required data. Being that XML supports attributes the data may be represented in a slightly different way in XML than the same data in JSON.

    Here is an example of the same record in XML:
             <?xml version="1.0"?>
    <records>
    	<record id="1449570">
    		<PERMA_LINK>https://linctas.ent.sirsidynix.net.au/client/en_AU/all/search/detailnonmodal/ent:$002f$002fNAME_INDEXES$002f0$002fNAME_INDEXES:1449570/one</PERMA_LINK>
    		<REFERENCE_URL>
    			<URL_TEXT>GD128/1/1</URL_TEXT>
    			<URL>https://stors.tas.gov.au/GD128-1-1$init=GD128-1-1p155j2k</URL>
    		</REFERENCE_URL>
    		<TASMANIAN>Published in Tasmania</TASMANIAN>
    		<TASMANIAN>About Tasmania</TASMANIAN>
    		<TASMANIAN>By a Tasmanian</TASMANIAN>
    		<INDEX>Prisoners</INDEX>
    		<PUBDATE_RANGE>1900-1902</PUBDATE_RANGE>
    		<NAME_FULL_DISPLAY>Allen, William</NAME_FULL_DISPLAY>
    		<LINC_TAS_AVAIL>Online</LINC_TAS_AVAIL>
    		<NAME>Allen, William</NAME>
    		<FORMAT>VIEW</FORMAT>
    		<PUBDATE>1900</PUBDATE>
    		<PUBDATE>1901</PUBDATE>
    		<PUBDATE>1902</PUBDATE>
    		<YEAR>1900-1902</YEAR>
    		<YEAR_DISPLAY>1900-1902</YEAR_DISPLAY>
    		<CATKEY>(Sirsi) 1449570</CATKEY>
    	</record>
    </records>
    
  • CSV

    CSV is just simple comma-separated values. We provide all of our data in CSV because data.gov.au is able to produce an API queryable dataset from CSV records. The issue with CSV is it's difficult to represent hierarchical data structures so we have used delimiters like pipe to separate values within a single value. Due to this we recommend using XML or JSON when possible for a more complete representation of the data.

    Here is an example of the same record in CSV:
             CATKEY,FORMAT,INDEX,LINC_TAS_AVAIL,NAME,NAME_FULL_DISPLAY,PERMA_LINK,PUBDATE,PUBDATE_RANGE,REFERENCE_URL,TASMANIAN,YEAR,YEAR_DISPLAY,NAME_SEE_ALSO
    "(Sirsi) 1449570","VIEW","Prisoners","Online","Allen, William","Allen, William","https://linctas.ent.sirsidynix.net.au/client/en_AU/all/search/detailnonmodal/ent:$002f$002fNAME_INDEXES$002f0$002fNAME_INDEXES:1449570/one","1900|1901|1902","1900-1902","GD128/1/1|https://stors.tas.gov.au/GD128-1-1$init=GD128-1-1p155j2k","Published in Tasmania|About Tasmania|By a Tasmanian","1900-1902","1900-1902",""

Using the open datasets

There are countless different ways of using the open datasets but here is an example of how to get started.

For this example I will use PHP to calculate the top 10 most popular baby names from 1803-1933.
I will start off by writing a function for fetching json data from data.gov.au

   /*
Function for fetching JSON from CKAN installations
- LINC Tasmania (www.linc.tas.gov.au)
*/
function fetchJSON($id) {
	//CURL options
	$ckanApiAddress = 'http://data.gov.au/api/';
	$proxyHost = 'proxy.education.tas.gov.au'; //insert proxy host or false to not user a proxy
	$proxyPort = '8081';
	$proxyUser = false; //insert proxy host or false to not user a proxy
	$proxyPassword = '';
	
	//CURL init
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	if ($proxyHost !== false) {
		curl_setopt($ch, CURLOPT_PROXY, $proxyHost);
		curl_setopt($ch, CURLOPT_PROXYPORT, $proxyPort);
		curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
		if ($proxyUser !== false) {
			curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyUser . ':' . $proxyPassword);
		}
	}
	
	//Fetch download URL
	curl_setopt($ch, CURLOPT_URL, $ckanApiAddress . 'action/resource_show?id=' . $id);
	$data = curl_exec($ch);
	
	$data = json_decode($data);
	$dlUrl = $data->result->url;
	
	//Downloading JSON
	curl_setopt($ch, CURLOPT_URL, $dlUrl);
	$data = curl_exec($ch);
	$data = json_decode($data);
	curl_close($ch);
	
	return $data;
}
Using this function I will fetch the births dataset and count the occurrence of every name.
   $birthRecords = fetchJSON('4e30dd59-f38d-489a-8ee5-817ac70e5f1a');

$names = array();
$count = 0;
foreach ($birthRecords as $record) {
	$givenName = $record->NAME_FULL_DISPLAY;
	if (strpos($givenName,', ') === false) continue;
	if (strpos($givenName,'Not Recorded') !== false) continue;
	$count++;
	$givenName = substr($givenName, strpos($givenName,', ') + 2);
	if (isset($names[$givenName])) $names[$givenName]++;
	else $names[$givenName] = 1;
}
//Sort name data desc
arsort($names);

echo 'Top 10 most popular baby names from 1803-1933', "\r\n";
$i = 0;
foreach ($names as $name => $num) {
	echo $name, ' (', $num, ' of ', $count, ') ', number_format( $num/$count * 100, 2 ), '%', "\r\n";
	$i++;
	if ($i == 10) break;
}
After running our code we get the following output:
   Top 10 most popular baby names from 1803-1933
John (3800 of 191615) 1.98%
William (3568 of 191615) 1.86%
James (2616 of 191615) 1.37%
Mary Ann (2359 of 191615) 1.23%
Thomas (2104 of 191615) 1.10%
Elizabeth (2024 of 191615) 1.06%
Mary (2021 of 191615) 1.05%
George (1827 of 191615) 0.95%
Henry (1365 of 191615) 0.71%
Charles (1207 of 191615) 0.63%
As you can see it's relatively easy to perform reports on JSON datasets using PHP.

Using the data.gov.au API

data.gov.au provides an API for querying datasets within their collection. Unfortunately only our CSV datasets are currently available via this API.
The API URL for Data.gov.au is http://data.gov.au/api/ and you can view the full API guide here.

The CKAN API is useful for front end ajax style calls.
One of the abilities made available by the API is the ability to perform SQL syntax queries against datasets
You just use the ID as the table name and field names from the first row of the CSV to perform your SQL query.
Note: you must wrap the table and field names in double quotes.

Here is an example of a SQL statement that I can use to query a dataset:
   SELECT count("DEPARTURE_PORT") as count, "DEPARTURE_PORT" from "8d3e12fa-a9fd-4268-9164-6240a97ff44d" WHERE "DEPARTURE_PORT" != '' GROUP BY "DEPARTURE_PORT" ORDER BY count DESC LIMIT 10
To perform that SQL query I simply encode my SQL statement and pass it via a query string parameter like so:
   https://data.gov.au/api/action/datastore_search_sql?sql=SELECT%20count(%22DEPARTURE_PORT%22)%20as%20count,%20%22DEPARTURE_PORT%22%20from%20%228d3e12fa-a9fd-4268-9164-6240a97ff44d%22%20WHERE%20%22DEPARTURE_PORT%22%20!=%20%27%27%20GROUP%20BY%20%22DEPARTURE_PORT%22%20ORDER%20BY%20count%20DESC%20LIMIT%2010

The API will respond in JSON which makes it very easy to parse the response.

Here we have an example using AJAX to fetch the JSON from the API and parse it in to a response

   $.ajax({
    url: 'https://data.gov.au/api/action/datastore_search_sql?sql=SELECT%20count(%22DEPARTURE_PORT%22)%20as%20count,%20%22DEPARTURE_PORT%22%20from%20%228d3e12fa-a9fd-4268-9164-6240a97ff44d%22%20WHERE%20%22DEPARTURE_PORT%22%20!=%20%27%27%20GROUP%20BY%20%22DEPARTURE_PORT%22%20ORDER%20BY%20count%20DESC%20LIMIT%2010',
    dataType: "jsonp",
    success: function( response ) {
		var output = "Top 10 departure ports\n";
		$.each(response.result.records, function(k,v) {
			output += v.DEPARTURE_PORT + ": " + v.count + "\n";
		});
		alert(output);
    }
});
You can execute the example code with the button bellow:

Licence information

Our open data collection is covered by the Creative Commons Attribution 4.0 International licence.
More information on this licence can be found here.