Hacker News new | comments | show | ask | jobs | submit | dchuk's comments login

This is super interesting to me, is there anything else you can share about how you approached this? In my scraping Google experience I have found roughly the same thing where once you've passed the captcha test, you can scrape a lot more.

Were you scraping with real browsers or something like Mechanize/Curl? Rate limiting at all? Proxies or real servers?


We had a pool of IPs we were leasing on our own, proxy services get abused and are poisoned.

We didn't rate limit, we would just increase the size of the cookie pool if a captcha was hit, which was rare because we would scrape n-pages till a threshold was met to prevent that session from being captcha'ed so we wouldn't have to captcha solve it. We had two pools, the primary pool and the "chilling" pool, cookies near their captcha life would cool off for a few hours before returning to the active pool which behaves just like any other resource pool, every page scraped would "borrow" a cookie out of the pool, customize the encrypted location key, and make the request with a common user agent string.

Scaling it was difficult but once we had it figured out, Erlang was invaluable to us and our dependence on IPs dropped once we figured out the cookie methodology.

Solving captchas is cheaper than renting IPs.


Thank you for the reply! Sorry but one last question: can you share anymore about this statement "customize the encrypted location key"?

When you set the location in Google it customizes the cookie with a named field that is a capital L I believe. That field is encoded or encrypted and I could never figure it out so I just constructed a rainbow table by using phantomjs to set a location and scrape the cookie out, pairing the known location value with the encrypted value so that we could customize "the location of the search".

Oh and I just used a generic HTTP request client in Erlang and xpath / HTML parser to extract what we needed.

Interesting how this is written as a traditional press release rather than software release notes like most other open source projects

Here's the changelog: https://raw.githubusercontent.com/nodejs/node/v6.x/CHANGELOG...

Notable changes summarized here: https://github.com/nodejs/node/pull/6383


That's because this IS an announcement blog post, in line with other https://nodejs.org/en/blog/announcements/. There is an actual changelog

That's because it is a press release. There is a changelog and release notes too, it just happens that whoever submitted this chose to link to a press release instead.

So if I'm using dlite now, and I want to transition to Docker for Mac once I get into the Beta...what do I need to do? Fully uninstall dlite? Can they be run side by side? (assuming no)

I am not sure if they conflict, there may be an issue with them both trying to use the same docker socket though, but you can probably just start one after you stop the other.

same as here, dchuk. Thanks!

Isn't this just promoting the concept of a well written wiki? Linking documents together IS a graph, whether you store it in a graph db or not. Graph DBs shine when it comes to graph queries, storing documents that are well linked can be handled with any old relational DB

One advantage of a graph database implementation over a traditional wiki implementation is in regard to querying along edges as opposed to following links. Which is to say that the edges of a graph database will tend to have richer semantics than a wiki's generic hyperlinks [but it is not to say that a graph database is necessarily better].

To put it another way, at best a link from a wiki page gets implied importance based on it's position within a document. A graph database edge can have many properties including "weight" or "relevance". Of course these could be added to wiki markup as properties, but there is a point at which the right tool is better than layered make-do's.


I don't see it is that different. A wiki page typically assigns human-readable properties through its text. You don't write

  <a href="foo.html">Read more</a>
  <a href="bar.html">Read more</a>
but:

  <a href="foo.html">Example code</a>
  <a href="bar.html">Grammar</a>
If desired, one can even make this easier to parse programmatically:

  <a href="foo.html" class="ExampleCode">Hello world example</a>
  <a href="bar.html" class="GrammarNotes">Grammar</a>
That allows for styling different types of links differently. Also, a wiki's backend software could extract separate indexes for sample code, grammar fragments, links to compiler source, etc. you could also easily include an attribute for, e.g. Click-through rate, and, if desired, style links accordingly.

In fact, the difference between a classical wiki and a graph database is, IMO, an implementation choice. If you want to process complex queries rapidly or a lot of your attributes aren't textual or aren't intended primarily for visual display, a graph database is more appropriate. If you want to serve standard web pages rapidly, storing the content as HTML may be more appropriate, even if that means that processing queries such as "give me all links to source code examples" run way more slowly.


Every computation and/or representation is isomorphic to some (hypothetical) graph.

Could you clarify: What are the nodes and edges? May the graph contain cycles? May the graph be infinite? If the graph is infinite, do we need a distinguished initial node? Do non-isomorphic graphs always correspond to different computations, or do you have a coarser notion of equivalence (say, graph homeomorphism)?

I've read papers where the space of possible computations for a given (typically, toy, even loop-free) nondeterministic concurrent program is represented as a directed topological space (see “directed algebraic topology” for more information), but these spaces are more general than directed graphs.


I don't mean directed graphs, only vertices and edges.

I don't think you would need to deal with infinite graphs, as "arbitrarily large" should be able to cover all (finite) possible computations and/or representations.

My guess is that every graph may correspond to an infinite number (though, not necessarily every) of possible computations and/or representations, depending on which "correspondence function" is being used to analyze the graph (though there may be additional "correspondences" between those functions and/or their results).


At the very least, answer this: What are the nodes and edges in your graph? What information is associated to them?

Broadly speaking, in graph reduction the nodes are functions and values and the edges are steps in the computation that can be taken to apply functions to those values. After application that node becomes a new value, etc. until the program is evaluated or terminated.

https://en.wikipedia.org/wiki/Graph_reduction https://en.wikipedia.org/wiki/Abstract_semantic_graph

Generally they're acyclic but some types of ASGs can represent recursive functions as cycles, so they are distinct from trees.


Thanks for your actually useful answer.

I don't think there's any reason to be judgy about the other commenter. I understand you wanted a rigorous definition but there are better ways to ask for it than making unilateral demands and using negative terms like "disappointing" and "annoying". You may be well versed in these subjects but it is possible to be technically correct and gracious at the same time. I think you're capable of doing so and you would probably find more fruitful discussions by taking that path.

What did you not find useful about my numerous answers?

Basically, everything is in some way equivalent to some graph. Since anything is an example, it's difficult to say much without using an example, which then necessarily constrains the discussion.

It would be more useful if you presented an example computation or representation, then I could show you how to make and reverse an equivalent graph. (Which then might show you what information might be associated with nodes and edges.)


We'd need to be talking about a concrete example to say with much specificity, but a node is basically a "thing" and an edge is a binary relationship between two things. A thing may have relationships with itself.

The precise "information" associated to them could be "anything", but would be proscribed by the computation and/or representation being disccussed and the graph in question.


How disappointing. I was expecting something more concrete, like “every node is a sequence point and every edge is a possible transition” or “every node is a point at which a nondeterministic event occurs and every edge corresponds to a causal relation between events”. But if you just say “a computation is a graph” and give no further details, then there's no actual benefit to modeling computations as graphs.

I'm even more annoyed by your use of the word “isomorphic”, which, FYI, doesn't mean “vaguely similar in some way I can't articulate”. It only makes sense two speak of two mathematical objects being “isomorphic” when they belong in the same category. What category do you have in mind, that includes both computations and graphs as objects, and what are the morphisms between them?


Disappointing? You are the one coming to me expecting an example-oracle that produces them on demand.

E.g.

Computation / representation:

    2 + 2
Graph

       [+]

      /   \

    [2]    [2]
There is a way to produce the graph from the computation / representation and vice versa. There is a way to do that for every possible computation / representation and every possible graph (though, not necessarily with every pair).

This could be described as a compiler or compiler-like. There are many interesting things to do with them, but in this example all that is needed to consider is the construction of an Abstract Syntax Tree from a string and an in-order tree walker that emits the contents of the nodes to an output string.


(0) You said “I don't mean directed graphs, only vertices and edges.” But a syntax tree, viewed as a graph, is very much a directed one - the relation between a parent node and its children is asymmetric.

(1) A syntax tree isn't the same thing as a computation. Of course, you can produce computations by interpreting syntax trees, but: (a) It's perfectly possible for two distinct syntax trees to produce the same computation. [Say, by renaming all variables.] (b) It's also perfectly possible that interpreting the same syntax tree twice will produce completely different computations! [Say, if your language isn't pure.] Unsurprisingly, a syntax tree is a representation of syntax, not computation.

(2) Yes, I'm disappointed, because I expected your observation that “every computation (...) is isomorphic[sic] to some graph” to provide more insight than it turned out to.

Edit: Turned paraphrasing into literal quote.


Placing [sic] into a paraphrasing is just misleading.

0) You asked, "What information is associated to [nodes and edges]?". In this example, one piece of information is the direction of an edge.

1) I don't think I claimed they were the same. In fact, I am claiming that one can be represented by the other and vice versa. a) yes, that is part of the reason I conject that each graph may be associated with an "infinite number" of computations and/or representations b) also a reason -- there are perhaps an infinite number of languages, interpreters, compilers, etc. (some of which may produce the same result) for a given graph.

2) It was only an observation that the comment I responded to claimed only a subset of the truth. Anything done on a computer can be considered as a graph.


The point isn't necessarily one graph representation, but simply that any problem can be looked at in a graphy manner if you try hard enough, so "hey, it's a graph" isn't an interesting insight. But to answer your question, data flow graphs are a general representation of function composition, and Turing machines can be looked at as graphs with labeled edges. Yes, we're using "isomorphism" in the pop-compsci, Gödel-Escher-Bach sense.

Their product is advertising, their delivery mechanism is television content.

-----


A TV network's product is television content, their monetization strategy is advertising. The delivery mechanism is cable / airwaves.

Sure there are secondary economic effects and markets that complicate the picture. But the basic business model of a television network isn't all that different than a regular company that sells their products directly.

-----


Yes, but the product being sold is broadcast time slots, not audio/video.

-----


If that's true, then what's their programming called?

Programming is their product, and customers buy it by paying attention (or to netflix, hulu, comcast, etc more directly). Attention is the currency, and ads let them exchange attention for USD.

-----


So AMC and HBO are making different things? No, of course not. They both produce content and distribute it through cable networks. One of them charges advertisers to deliver the product to viewers, while the other one charges the viewers directly.

To be fair, if you're an advertiser, then the product is eyeballs. But if you're a pair of eyeballs, then the product is the content. They're not mutually exclusive. Same with google. If you're a searcher, then the product is results. If you're an advertiser, then the product is relevant searchers.

-----


They don't even make the ads. Their product, like Google's is eyeballs.

-----


I've seen your timeblock.com site a few times because of these threads, and every time I click on the link and end up back on your landing page I get so frustrated because your site does such an amazing job of not actually saying anything about the product. What is your product? What does it look like? How do I use it? How does it compare to other products? Why would I pay for it? What am I paying for?

-----


The product is a methodology that helps makers and managers align their expectations and plans giving makers more time in flow and managers better overview.

When you use the app/methodology you get less stress, less chaos, more creativity and better planning leading to more revenue and better quality of your products.

One user said "it's like scrum, just more humane"

You would pay for it if you wanted more transparency and improved communication between you, your co-founders, your employees and your customers.

It is also an app that supports the methodology, it currently integrates with Jira, and we are working on Asana and TeamWork.

I apologize for the missing info on the website, we are shooting explainer videos today!

-----


Congrats on the biz Mike, hope you're doing well :)

-----


Thanks dchuk, you too! Gearing up for a "go big or go home" kind of attack soon... definitely can't complain!

-----


One of the best mac apps in existence but holy jeebus is the naming a mess:

- It's called iTerm2 Version 3 now, rather than iTerm3 - It's called iTerm2 Version 3 now, but the actual app version is 2.9

-----


The app's name is iTerm2. Version numbers have to increase lexicographically, so the version 3 beta is called 2.9. I'm open to suggestions :)

-----


In SemVer [1], it's prescribed that the first number is for breaking changes, the second number for new features, and the third number for fixes. So this would be v3.0.0.

It also allows you to add a suffix though for beta-type versions, so the beta would be v3.0.0-beta, or v3.0.0-rc1, etc.

[1] http://semver.org/

-----


I want the first "stable" release to be 3.0.0, which is lexicographically before 3.0.0-beta. Admittedly this is only a limitation of Sparkle, but I just haven't prioritized modifying Sparkle above fixing real bugs.

-----


SemVer works great for libraries, but it falls apart for actual software IMO. How would you define a "breaking change" in a software ? Would it be a complete redesign ? Or maybe a feature removal (Does that ever happen ?)

Versions are supposed to convey meaning over how much has changed to the app between two versions. iTerm got soooooo many new features in this new version, anything smaller than a major bump would feel out of place.

I do agree the actual version should be v3.0.0-beta1 tho ^^.

-----


AppleScript isn't backwards compatible, sounds like that would fit just about any definition of a breaking change.

-----


> SemVer works great for libraries, but it falls apart for actual software IMO

SemVer is for APIs. From the spec:

> 1. Software using Semantic Versioning MUST declare a public API.

-----


Yes, the public API for a library that the developer interfaces with. Not a web APi.

-----


I think it should be called iTerm 3. George, thank you SO MUCH for your work on this excellent app.

-----


Call it iTerm3

-----


...because it costs money to run it? And it costs money to hire good talent to maintain it?

-----


Yes, but I would imagine that cost is not too much. Why not make it into something like a public resource. Often that approach will generate more wealth for the greater community.

Often the for-profit model and fiduciary responsibility can be constraining in wealth creation in general.

-----

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: