I’m going to extend my garage system, such that when I exit the garage in the mornings, it will inform me on the dot matrix display of the next train departure time, then I’ll know whether I can take it easy or drive like a lunatic because I’m late
The web site that I’m contemplating scraping, produces some fairly simple HTML from the realtime live departure boards, and it probably wouldn’t be very difficult to hard code searching for various ‘signatures’, but just wondered if anyone else had explored HTML parsing.
I’ve applied for a license/token, but I doubt I’ll get one, and even if I do, they will probably want some revenue. I didn’t want to roll my own SOAP/XML and process the WSDL, etc.
However, there are a number of web sites that scrape the National Rail live departure boards and present relatively simple HTML. Examples include livetrains.co.uk and traintimes.org.uk
If I was doing it, I’d probably use an external server to do the scraping via curl and php, when called by the imp, and just return the scraped info as JSON, which can be easily processed by the Imp for displaying. Not sure what your setup is though!
I wanted to strive to keep all the processing in the Imp as I’m finding it very capable. I will probably consider the server approach, certainly if I get the LDB webservice access. Meanwhile, I’ll experiment with traintimes.org.uk (it has some nice URL constructs) and scraping/parsing the HTML.
This site explains that it is free up to 5 million accesses per 4-week period. I’m in the U.S., so I don’t know what the “Darwin” service is. But is that what you would use to find a departure from a station and “on time” status?
Take a look at Kimono Labs/ - it let’s you turn webpages into structured API data (I haven’t played with it before, so I can’t provide any guidance outside of “some very smart people I know use and like this service”)
@norman …
API’s are developed by website owners as a means to share or provide information in a controlled manner. Some are free, some require a fee. People seem to think that whatever is on the internet is free-for-the-taking. They feel they can just write scripts to copy content, or copy and paste anything from any website. Whatever happened to ethical practices and respect for the information and content people are willing to provide? I should do an experiment and copy everything off the loginworks website and post it on my own website. I wonder what they would think about that? I imagine they would not be very happy about it. It’s too bad that people can’t be creative enough to come-up with their own ideas and content. Loginworks is an India company … hmmm.