Skip to main content

TTDS - top navigation

  • Learn
  • Piazza
  • DRPS

Breadcrumb

  1. Home
  2. TTDS: Text Technologies for Data Science
  3. TTDS: Course Materials
  4. TTDS: Labs

TTDS: Lab 0

How to read a text file from hard-disk. This lab is optional for those who are not fully confident about their programming skills. There is nothing specific to be done in this lab more than reading a text file from HD word by word, which is the most basic skill you need to have to be able to take the course.

PROGRAMMING LANGUAGES

  • You need to have Perl or Python on your machine (you still can use something else) if you prefer.
  • If you are using Dice, then you should have them there. Check with demonstrators how to run them.

DOWNLOAD A SAMPLE TEXT FILE

  • Download the following file, which has the text of the Bible: link

SKILLS TO DO WITH THE FILE

You need to be confident with the following skills with any programming language when dealing with a text file:

  • Reading and Writing into text files
  • Reading text by word, and calling functions to process word if required (e.g. lower case word letters)
  • Regular expressions would be very useful to know
  • Count the number of occurences of the words: "lord", "to", and "36"

USEFUL TIPS

Python Tutorials: you can check one of these tutorials:

  • Tutorial 1: Code Academy
  • Tutorial 2: MIT

Useful Shell Commands 
Print frequency of unique terms in a given collection: 
- cat text.file | tr " " "\n" | tr "A-Z" "a-z" | sort | uniq -c | sort -n > terms.freq 
- cat text.file | perl -p -e "s/[^\w]+/\n/g" | tr "A-Z" "a-z" | sort | uniq -c | sort -n > terms.freq

All Unix Shell Commands for Windows: 
- Contact the course lecturers to get the link.
- unzip the directory at a decent location on your drive (e.g. c:\ or c:\program files\) 
- add the path to the "bin" directory to your Windows path: (example)

Files
pg10.txt (4.25 MB)
License
All rights reserved The University of Edinburgh

Book traversal links for TTDS: Lab 0

  • TTDS: Labs
  • Up
  • TTDS: Lab 1

Navigation links

  • TTDS: Course Materials
    • TTDS: Schedule
    • TTDS: Labs
      • TTDS: Lab 0
      • TTDS: Lab 1
      • TTDS: Lab 2
      • TTDS: Lab 3
      • TTDS: Lab 5
      • TTDS: Lab 6
      • TTDS: Lab 7
  • TTDS: Library Resources
  • TTDS: Assessment
RSS feed

Opencourse privacy & accessibility statements; contact Informatics, ILTS.