当前位置:网站首页>Unimelb COMP20008 Note 2019 SM1 - Data formats
Unimelb COMP20008 Note 2019 SM1 - Data formats
2022-06-24 15:18:00 【403 Forbidden】
Lectures 2 and 3: Data formats
Relational Databases
-Appreciate the role that relational databases play in data wrangling.
- Structure data is like a relational database
- Structured: Relational databases, CSV
- Unstructured: text
- Semi-structured: HTML,XML,JSON
- Advantages:
- Easier to analyse, easier to query
- Easier to store
- Easier to clean, maintain consistency and security, especially with multiple users
- Regularity
- Relational databases, the classic method of storing structured data (banking, sales, airlines …)
- Data stored in tables, each row is a data item and columns describe attributes of the data item
- Can query the data using a high-level language such as SQL
Regular Expression
-be able to understand a regular expression using the operators
. ^ $ * + | [ ]
-be able to formulate a regular expression using the above operators, based on an English description
CSV
-be able to explain what is a CSV file, what is a spreadsheet, what is the difference?
- Spreadsheet
- The spreadsheet is a file made of row and columns that help sort data, arrange data easily, and calculate numerical data
- CSV
- Spreadsheet
- Easy to use
- Structured, but not like a relational DB
- Differences
- CSV are human readable
- CSV lack the formatting information
- CSV format is a plain text format in which values are separated by commas, while a spreadsheet is the binary file format that holds information in a file, included both content and formatting.
XML
-be able to explain the motivation for XML and XML namespaces
- A 'meta' mark-up language
- Extensible: user define tags
- Facilitate better encoding of semantics
- It's beneficial to reuse parts from existing, well-designed schemas
- Allowing searching engines or other tools to operate over a range of documents that in many respects but use common names for common element types
- XML namespaces are base on the use of qualified names, which contain a single colon, separating the name into a namespace prefix and the local name. The prefix, which is mapped to a URI, selects a namespace.
- The combination of the universally managed URI namespace and the local schema namespace produces names that guaranteed universally unique
-be able to explain the differences between XML and HTML
- HTML tags are predefined tags where as XML tags are user-defined tags
- HTML tags are the limited number of tags where as XML tags are extensible.
- HTML tags are case insensitive where as XML tags are sensitive.
- HTML tags are meant for displaying the data but not for describing the data where as XML tags are meant for describing the data.
- HTML focuses on how data looks where as XML focuses on what data is.
-be able to explain the difference between XML attributes and elements and describe situations in which the use of one is preferred over the other
- XML element
- An XML element is everything from (including) the element's start tag to (including) the element's end tag
- <element></element>
- An element contains:
- Text
- Attributes
- Other elements
- Or a mix of the above
- XML Attribute
- Attributes are part of XML elements
- Attributes define properties of elements
- Attribute is always a name-value pair
-be able to create XML documents, based on a natural language specification
-be able to both create and understand XML documents that use XML namespace syntax
- Default Namespace:
- Using namespace, you can define the context in which names are defined. In essence, a namespace defines a scope
- xmlns="namespaceURI"
- xmlns="http://info.gov.uk"
- Defining a default namespace for an element also saves us from using prefixes in all the child elements
- Using prefix
- Name conflict in XML can be solved using a prefix with namespace.
- It provides a method to avoid element name conflicts.
- In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications.
- A user or an XML application will not know how to handle these differences.
- xmlns Attributes
- When using prefixes in XML, a namespace for the prefix must be defined.
- The namespace can be defined by an xmlns attribute in the start tag of an element.
- The namespace declaration has the following syntax.
- xmlns:prefix="URI"
-be able to explain the purpose of XML namespaces and list reasons for why it is useful
- XML Namespaces provide a method to avoid element name conflicts.
- To group elements relating to a common idea together
-understand what is mean by well-formed XML and valid XML
- Syntax-Well-formed
- XML files must begin with declaration
- <?xml version="1.0"?>
- XML elements
- Start/end tags or empty tags
- Attributes in quotes
- <campus>Parkville</campus>
- <campus location="Parkville"/>
- Appropriately nested
- One root element
- Comments
- <!--comments do not affect the document, it's not part of the data that you want to represent-->
- Some characters have special meaning
- '<' and '&' are strictly illegal inside an element
- CDATA(character data) section may be used inside XML element to include large blocks of text, which may contain these special characters such as &,>
- The XML standard states that an XML document that conforms to the standard is said to be "Well-formed." The XML standard has many syntaxes, grammar and structure rules. An XML document must have a single root element, the elements must be properly nested, tag names cannot begin with a number or contain certain characters, and so on.
- XML files must begin with declaration
- XML schema & validation
- An XML file can be well-formed and NOT valid; it is valid if it is consistent with a particular schema.
- XML Schema languages, examples:
- XSD (XML Schema Definitions): a W3C standard
- DTD (Document Type Definitions)
- HTML5 schema for Web browsers <!DOCTYPE html>
- Validation Tools (schema checking software)
- local XML editors (XMLWriter, Editix, Liquid XML … )
- online validators: http://validator.w3.org/,
- https://www.xmlvalidation.com/index.phplxml (python library)
- XML validation is distinct from well-formed. An XML document is said to be valid if it is associated with a document type definition(DTD), or an XML schema, and complies with the constraints specified in the DTD or schema.
-be able to explain the difference between XML and JSON and applications where each is suited
- F
- JSON is simpler and more compact/lightweight than XML. Easy to parse.
- Common JSON application – read and display data from a webserver using JavaScript
- XML comes with a large family of other standards for querying and transforming (XQuery, XML Schema, XPATH, XSLT, namespaces, …)
- XML stands for “Extensive Markup Language” and is written in a similar way as followed by HTML, whereas JSON stands for “JavaScript Object Notation” which is a subset of the JavaScript syntax and is completely language-independent.
- XML allows complex schema definitions (via regular expressions)
- allows formal validation
- makes you consider the data design more closely
- JSON is more streamlined, lightweight and compressed
- Which appeals to programmers looking for speed and Efficiency
- Widely used for storing data in noSQL databases
- JSON
-be able to read and create documents using JSON
- Syntax
- Object data is in name/value pairs
- "firstName":"John"
- JSON values
- A number (integer or floating point)
- A string (in double quotes)
- A Boolean (true or false)
- An array (in square brackets)
- An object (in curly braces)
- Null
- JSON Objects
- {"firstName":"John", "lastName":"Doe"}
- JSON Arrays
- "employees":[
- Object data is in name/value pairs
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]
- These objects repeat recursively down a hierarchy as needed.
-be able to convert an XML document to JSON and vice versa
- JSON see last question
-be able to explain the purpose of using schemas for XML and JSON data
- JSON Schema
- Written in JSON itself
- Describes the structure of other data
- Easy to validate a JSON document against its schema using aschema validator
- XML Schema
- Written in XML itself
- Schema is Extensible
- It is easier to describe allowable document content
- It is easier to validate the correctness of data
- It is easier to define data facets (restrictions on data)
- It is easier to define data patterns (data formats)
- It is easier to convert data between different data types
边栏推荐
- I have been in the industry for 4 years and have changed jobs twice. I have learned a lot about software testing
- Redis consistency hash and hash slot
- Concurrent writing of maps in golang
- postgresql 之 ilist
- update+catroot+c000021a+critical service failed+drivers+intelide+viaide+000000f
- 同样是初级测试工程师,为啥他薪资高?会这几点面试必定出彩
- postgresql之词法分析简介
- R language constructs regression model diagnosis (normality is invalid), performs variable transformation, and uses powertransform function in car package to perform box Cox transform to normality on
- 左手代码,右手开源,开源路上的一份子
- `Thymeleaf ` template engine comprehensive analysis
猜你喜欢

Port conflict handling method for tongweb

Virtual machines on the same distributed port group but different hosts cannot communicate with each other

在宇宙的眼眸下,如何正确地关心东数西算?

From pair to unordered_ Map, theory +leetcode topic practice

API data interface for announcement of Hong Kong listed companies

常见的缺陷管理工具——禅道,从安装到使用手把手教会你

动作捕捉系统用于地下隧道移动机器人定位与建图

入行 4 年,跳槽 2 次,我摸透了软件测试这一行

In the eyes of the universe, how to correctly care about counting East and West?

Keyword of ES mapping; Term query add keyword query; Change mapping keyword type
随机推荐
June training (day 24) - segment tree
Keyword of ES mapping; Term query add keyword query; Change mapping keyword type
Ethical considerations
Xingxinghai, it is said that the new generation can fight better?
安装wireshark时npcap怎么都安装不成功,建议先用winpcap
A brief introduction to the lexical analysis of PostgreSQL
laravel 8 实现Auth登录
Since the household appliance industry has entered the era of stock competition, why does Suning win the first channel for consecutive times?
阿里OSS对象存储服务
Development of digital Tibetan product system NFT digital Tibetan product system exception handling source code sharing
Sequential representation and implementation of linear table (refer to YanWeiMin version)
Common sense knowledge points
FPGA based analog I ² C protocol system design (Part I)
VIM common shortcut keys
Explore cloud native databases and take a broad view of future technological development
Six stones Management: garbage dump effect: if you don't manage your work, you will become a garbage dump
大智慧开户要选什么证券公司比较好,更安全一点
不要小看了积分商城,它的作用可以很大
Application of motion capture system in positioning and mapping of mobile robot in underground tunnel
Port conflict handling method for tongweb